Review of Knowledge-Enhanced Pre-trained Language Models

doi:10.3778/j.issn.1673-9418.2108105

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (7): 1439-1461.DOI: 10.3778/j.issn.1673-9418.2108105

• Surveys and Frontiers • Previous Articles Next Articles

Review of Knowledge-Enhanced Pre-trained Language Models

HAN Yi¹, QIAO Linbo², LI Dongsheng², LIAO Xiangke²^,⁺()

1. College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China
2. College of Computer, National University of Defense Technology, Changsha 410073, China

Received:2021-08-30 Revised:2022-03-22 Online:2022-07-01 Published:2022-07-25
Supported by:
the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL)(6142110200203);the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL)(WDZC20205500101)

知识增强型预训练语言模型综述

韩毅¹, 乔林波², 李东升², 廖湘科²^,⁺()

1.国防科技大学气象海洋学院,长沙 410073
2.国防科学大学计算机学院,长沙 410073

作者简介:韩毅（1993—）,男,山东青岛人,博士,讲师,主要研究方向为自然语言处理、知识图谱等。
HAN Yi, born in 1993, Ph.D., lecturer. His research interests include natural language processing, knowledge graph, etc.
乔林波（1987—）,男,重庆万州人,博士,助理研究员,主要研究方向为结构化稀疏学习、分布式优化、深度学习等。
QIAO Linbo, born in 1987, Ph.D., research assistant. His research interests include structured sparse learning, distributed optimization, deep learning, etc.
李东升（1978—）,男,安徽桐城人,博士,研究员,博士生导师,主要研究方向为并行与分布式计算、云计算、大规模数据管理等。
LI Dongsheng, born in 1978, Ph.D., professor, Ph.D. supervisor. His research interests include parallel and distributed computing, cloud computing, large-scale data management, etc.
廖湘科（1963—）,男,湖南涟源人,博士,研究员,博士生导师,主要研究方向为并行与分布式计算、高性能计算机系统、操作系统等。
LIAO Xiangke, born in 1963, Ph.D., professor, Ph.D. supervisor. His research interests include parallel and distributed computing, high-performance computer systems, operating systems, etc.
基金资助:
并行与分布处理重点实验室（PDL）科技开放基金(6142110200203);并行与分布处理重点实验室（PDL）科技开放基金(WDZC20205500101)

Abstract

Abstract:

The knowledge-enhanced pre-trained language models attempt to use the structured knowledge stored in the knowledge graph to strengthen the pre-trained language models, so that they can learn not only the general semantic knowledge from the free text, but also the factual entity knowledge behind the text. In this way, the enhanced models can effectively solve downstream knowledge-driven tasks. Although this is a promising research direction, the current works are still in the exploratory stage, and there is no comprehensive summary and systematic arrangement. This paper aims to address the lack of comprehensive reviews of this direction. To this end, on the basis of summarizing and sorting out a large number of relevant works, this paper firstly explains the background information from three aspects: the reasons, the advantages, and the difficulties of introducing knowledge, summarizes the basic concepts involved in the knowledge-enhanced pre-trained language models. Then, it discusses three types of knowledge enhancement methods: using knowledge to expand input features, using knowledge to modify model architecture, and using knowledge to constrain training tasks. Finally, it counts the scores of various knowledge enhanced pre-trained language models on several evaluation tasks, analyzes the performance, the current challenges, and possible future directions of knowledge-enhanced pre-trained language models.

Key words: knowledge graph, pre-trained language models, natural language processing

摘要：

知识增强型预训练语言模型旨在利用知识图谱中的结构化知识来强化预训练语言模型,使之既能学习到自由文本中的通用语义知识,又能够学习到文本背后的现实实体知识,从而有效应对下游知识驱动型任务。虽然该方向研究潜力巨大,但相关工作目前尚处初期探索阶段,并未出现全面的总结和系统的梳理。为填补该方向综述性文章的空白,在归纳整理大量相关文献的基础上,首先从引入知识的原因、引入知识的优势、引入知识的难点三方面说明了知识增强型预训练语言模型产生的背景信息,总结了其中涉及的基本概念;随后列举了利用知识扩充输入特征、利用知识改进模型架构以及利用知识约束训练任务等三大类知识增强方法;最后统计了各类知识增强型预训练语言模型在评估任务上的得分情况,分析了知识增强模型的性能指标、目前面临的困难挑战以及未来可能的发展方向。

关键词: 知识图谱, 预训练语言模型, 自然语言处理

CLC Number:

TP18

HAN Yi, QIAO Linbo, LI Dongsheng, LIAO Xiangke. Review of Knowledge-Enhanced Pre-trained Language Models[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1439-1461.

韩毅, 乔林波, 李东升, 廖湘科. 知识增强型预训练语言模型综述[J]. 计算机科学与探索, 2022, 16(7): 1439-1461.

Figures/Tables 7

References 96

[1]	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[2]	BIANCHI F, HOVY D. On the gap between adoption and understanding in NLP[C]// Findings of the Association for Computational Linguistics, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 3895-3901.
[3]	索传军. 网络信息资源组织研究的新视角[J]. 图书馆情报工作, 2013, 57(7): 5-12.
	SUO C J. A new perspective for web resource organization research[J]. Library and Information Service, 2013, 57(7): 5-12.
[4]	LOGAN IV R L, LIU N F, PETERS M E, et al. Barack’s wife Hillary: using knowledge graphs for fact-aware language modeling[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 5962-5971.
[5]	JI S X, PAN S R, CAMBRIA E, et al. A survey on knowledge graphs: representation, acquisition, and applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514. DOI URL
[6]	ZHANG Z Y, HAN X, LIU Z Y, et al. ERNIE: enhanced language representation with informative entities[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 1441-1451.
[7]	SU Y S, HAN X, ZHANG Z Y, et al. CokeBERT: contextual knowledge selection and embedding towards enhanced pre-trained language models[J]. AI Open, 2021, 2: 127-134. DOI URL
[8]	MCCLOSKEY M, COHEN N J. Catastrophic interference in connectionist networks: the sequential learning problem[J]. Psychology of Learning and Motivation, 1989, 24: 109-165.
[9]	刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
	LIU Q, LI Y, DUAN H, et al. Knowledge graph construction techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600.
[10]	AMIT S. Introducing the knowledge graph[R]. America: Official Blog of Google, 2012.
[11]	MAHDISOLTANI F, BIEGA J, SUCHANEK F M. YAGO3: a knowledge base from multilingual wikipedias[C]// Procee-dings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, Jan 4-7, 2015: 1-11.
[12]	VRANDEČIĆ D, KRÖTZSCH M. Wikidata: a free collaborative knowledgebase[J]. Communications of the ACM, 2014, 57(10): 78-85.
[13]	BOLLACKER K D, COOK R P, TUFTS P. Freebase: a shared database of structured general human knowledge[C]// Proceedings of the 22nd AAAI Conference on A.pngicial Intelligence, Vancouver, Jul 22-26, 2007. Menlo Park: AAAI, 2007: 1962-1963.
[14]	BIZER C, LEHMANN J, KOBILAROV G, et al. DBpedia—a crystallization point for the web of data[J]. Journal of Web Semantics, 2009, 7(3): 154-165. DOI URL
[15]	SWARTZ A. MusicBrainz: a semantic web service[J]. IEEE Intelligent Systems, 2002, 17(1): 76-77.
[16]	AHLERS D. Assessment of the accuracy of GeoNames gazetteer data[C]// Proceedings of the 7th Workshop on Geographic Information Retrieval, Orlando, Nov 5, 2013. New York: ACM, 2013: 74-81.
[17]	WISHART D S, KNOX C, GUO A C, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration[J]. Nucleic Acids Research, 2006, 34: 668-672.
[18]	WANG X Z, GAO T Y, ZHU Z C, et al. KEPLER: a unified model for knowledge embedding and pre-trained language representation[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 176-194. DOI URL
[19]	HAO J, CHEN M, YU W, et al. Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, Aug 4-8, 2019. New York: ACM, 2019: 1709-1719.
[20]	CAMBRIA E, SONG Y Q, WANG H X, et al. Semantic multidimensional scaling for open-domain sentiment analysis[J]. IEEE Intelligent Systems, 2014, 29(2): 44-51.
[21]	徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
	XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.
[22]	XIONG C Y, POWER R, CALLAN J. Explicit semantic ranking for academic search via knowledge graph embedding[C]// Proceedings of the 26th International Conference on World Wide Web, Perth, Apr 3-7, 2017. New York: ACM, 2017: 1271-1279.
[23]	DAI Z H, LI L, XU W. CFO: conditional focused neural question answering with large-scale knowledge bases[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 800-810.
[24]	ZHANG F Z, YUAN N J, LIAN D F, et al. Collaborative knowledge base embedding for recommender systems[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 353-362.
[25]	LIU Z B, NIU Z Y, WU H, et al. Knowledge aware conversation generation with explainable reasoning over augmented graphs[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 1782-1792.
[26]	SPEER R, HAVASI C. ConceptNet 5: a large semantic network for relational knowledge[M]// GUREVYCHI, KIMJ. The People’s Web Meets NLP. Berlin, Heidelberg: Springer, 2013.
[27]	MILLER G A. WordNet: an electronic lexical database[M]. Cambridge: MIT Press, 1998.
[28]	QIU X P, SUN T X, XU Y G, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 63: 1872-1897. DOI URL
[29]	TURIAN J P, RATINOV L A, BENGIO Y. Word representations: a simple and general method for semi-supervised lear-ning[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Jul 11-16, 2010. Stroudsburg: ACL, 2010: 384-394.
[30]	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Advances in Neural Information Processing Systems 26, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 3111-3119.
[31]	PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1532-1543.
[32]	PETERS M E, NEUMANN M, LYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, New Orleans, Jun 1-6, 2018. Stroudsburg: ACL, 2018: 2227-2237.
[33]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2020-09-26]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[34]	VASWANI A, SHAZEER N, PARMARN, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.
[35]	KURSUNCU U, GAUR M, SHETH A. Knowledge infused learning (K-IL): towards deep incorporation of knowledge in deep learning[J]. arXiv: 1912. 00512, 2019.
[36]	QIN Y, LIN Y, TAKANOBU R, et al. ERICA: improving entity and relation understanding for pre-trained language models via contrastive learning[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 3350-3363.
[37]	XIONG W H, DU J F, WANG W Y, et al. Pretrained encyclopedia: weakly supervised knowledge-pretrained language model[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apl 26-30, 2020: 1-22.
[38]	ROSSET C, XIONG C, PHAN M, et al. Knowledge-aware language model pretraining[J]. arXiv: 2007. 00655, 2020.
[39]	XU S, LI H R, YUAN P, et al. K-PLUG: knowledge-injected pre-trained language model for natural language understanding and generation in E-commerce[C]// Findings of the Association for Computational Linguistics, Punta Cana, Nov 16-20, 2021. Stroudsburg: ACL, 2021: 1-17.
[40]	LIU W J, ZHOU P, ZHAO Z, et al. K-BERT: enabling language representation with knowledge graph[C]// Proceedings of the 2020 AAAI Conference on A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 2901-2908.
[41]	SUN T, SHAO Y, QIU X, et al. CoLAKE: contextualized language and knowledge embedding[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 3660-3670.
[42]	YAMADA I, ASAI A, SHINDO H, et al. LUKE: deep contextualized entity representations with entity-aware self-attention[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 6442-6454.
[43]	PÖRNER N, WALTINGER U, SCHÜTZE H. E-BERT: efficient-yet-effective entity embeddings for BERT[C]// Findings of the Association for Computational Linguistics, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 803-818.
[44]	LIU A L, DU J F, STOYANOV V. Knowledge-augmented language model and its application to unsupervised named-entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 1142-1150.
[45]	LIU X, YIN D, ZHANG X, et al. OAG-BERT: pre-train heterogeneous entity-augmented academic language models[J]. arXiv: 2103. 02410, 2021.
[46]	ZHANG F J, LIU X, TANG J, et al. OAG: toward linking large-scale heterogeneous entity graphs[C]// Proceedings of the 25th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, Anchorage, Aug 4-8, 2019. New York: ACM, 2019: 2585-2595.
[47]	LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[48]	BELTAGY I, LO K, COHAN A. SciBERT: a pretrained language model for scie.pngic text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 3613-3618.
[49]	LO K, WANGL L L, NEUMANN M, et al. S2ORC: the semantic scholar open research corpus[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 4969-4983.
[50]	LAUSHER A, VULIC I, PONTI E M, et al. Specializing unsupervised pretraining models for word-level semantic similarity[C]// Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Dec 8-13, 2020: 1371-1383.
[51]	ZHANG Z S, WU Y W, ZHAO H, et al. Semantics-aware BERT for language understanding[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 9628-9635.
[52]	YE Z X, CHEN Q, WANG W, et al. Align, mask and select: a simple method for incorporating commonsense knowledge into language representation models[J]. arXiv: 1908. 06725, 2019.
[53]	CHEN W, SU Y, YAN X, et al. KGPT: knowledge-grounded pre-training for data-to-text generation[J]. arXiv: 2010. 02307, 2020.
[54]	RASHKIN H, SAP M, ALLAWAYE, et al. Event2Mind:commonsense inference on events, intents, and reactions[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2020. Stroudsburg: ACL, 2020: 8635-8648.
[55]	SAP M, LE BRAS R, ALLAWAY E, et al. ATOMIC: an atlas of machine commonsense for if-then reasoning[C]// Proceedings of the 33rd AAAI Conference on A.pngicial Intelligence, the 31st Innovative Applications of A.pngicial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in A.pngicial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 3027-3035.
[56]	BOSSELUT A, RASHKIN H, SAP M, et al. COMET: commonsense transformers for automatic knowledge graph construction[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 4762-4779.
[57]	BORDES A, USUNIER N, GARCÍA-DURÁN A, et al. Translating embeddings for modeling multi-relational data[C]// Advances in Neural Information Processing Systems 26, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 2787-2795.
[58]	HE B, ZHOU D, XIAO J H, et al. Integrating graph contextualized knowledge into pre-trained language models[C]// Findings of the Association for Computational Linguistics, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 2281-2290.
[59]	SUN Y, WANG S, FENG S, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation[J]. arXiv: 2107. 02137, 2021.
[60]	PETERS M E, NEUMANN M, LOGAN IV R L, et al. Knowledge enhanced contextual word representations[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 43-54.
[61]	YU D, ZHU C, YANG Y, et al. Jaket: joint pre-training of knowledge graph and language understanding[J]. arXiv: 2010. 00796, 2020.
[62]	WANG R, TANG D, DUAN N, et al. K-adapter: infusing knowledge into pre-trained models with adapters[J]. arXiv: 2002. 01808, 2020.
[63]	JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure of language?[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 3651-3657.
[64]	LAUSCHER A, MAJEWSKA O, RIBEIRO F R, et al. Common sense or world knowledge? Investigating adapter-based knowledge injection into pretrained transformers[J]. arXiv: 2005. 11787, 2020.
[65]	GUU K, LEE K, TUNG Z, et al. Retrieval augmented language model pre-training[C]// Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 3929-3938.
[66]	YANG Z L, DAI Z H, YANG Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 5754-5764.
[67]	LEWIS M, LIU Y H, GOYAL N, et al. BART:denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 7871-7880.
[68]	LIU Y, OTT M, GPYAL N, et al. RoBERTA: a robustly optimized BERT pretraining approach[J]. arXiv: 1907. 11692, 2019.
[69]	JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT: improving pre-training by representing and predicting spans[J]. Transactions of Association for Computer Linguistics, 2020, 8: 64-77. DOI URL
[70]	WANG W, BI B, YAN M, et al. StructBERT: incorporating language structures into pre-training for deep language understanding[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020: 1-10.
[71]	CLARK K, LUONG M T, LE Q V, et al. ELECTRA: pre-training text encoder as discriminators rather than generators[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020: 1-15.
[72]	SUN Y, WANG S, LI Y, et al. ERNIE: enhanced representation through knowledge integration[J]. arXiv: 1904. 09223, 2019.
[73]	LEVINE Y, LENZ B, DAGAN O, et al. SenseBERT: driving some sense into BERT[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 4656-4667.
[74]	YE D M, LIN Y K, DU J J, et al. Coreferential reasoning learning for language representation[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 7170-7186.
[75]	MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data[C]// Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, Aug 2-7, 2009. Stroudsburg: ACL, 2009: 1003-1011.
[76]	李涓子, 侯磊. 知识图谱研究综述[J]. 山西大学学报(自然科学版), 2017, 40(3): 454-459.
	LI J Z, HOU L. Reviews on knowledge graph research[J]. Journal of Shanxi University (Natural Science Edition), 2017, 40(3): 454-459.
[77]	WANG Z, ZHANG J W, FENG J L, et al. Knowledge graph embedding by translating on hyperplanes[C]// Proceedings of the 28th AAAI Conference on A.pngicial Intelligence, Québec City, Jul 27-31, 2014. Menlo Park: AAAI, 2014: 1112-1119.
[78]	JI G L, HE S Z, XU L H, et al. Knowledge graph embedding via dynamic mapping matrix[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, Jul 26-31, 2015. Stroudsburg: ACL, 2015: 687-696.
[79]	FÉVRY T, SOARES L B, FITZGERALD N, et al. Entities as experts: sparse memory access with entity supervision[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 4937-4951.
[80]	VERGA P, SUN H, SOARES L B, et al. Facts as experts: adaptable and interpretable neural memory over symbolic knowledge[J]. arXiv: 2007. 00849, 2020.
[81]	SOARES L B, FITZGERALD N, LING J, et al. Matching the blanks: distributional similarity for relation learning[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 2895-2905.
[82]	PENG H, GAO T Y, HAN X, et al. Learning from context or names? An empirical study on neural relation extraction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 3661-3672.
[83]	SUN Y, WANG S H, LI Y K, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 8968-8975.
[84]	ZHOU W, LEE D H, SELVAM R K, et al. Pre-training text-to-text transformers for concept-centric common sense[C]// Proceedings of the 9th International Conference on Learning Representations, Austria, May 3-7, 2021: 1-15.
[85]	RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21: 140.
[86]	WANG A, SINGH A, MICHAEL J, et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding[C]// Proceedings of the 7th International Conference on Learning Representations, New Orleans, May 6-9, 2019: 1-20.
[87]	PETRONI F, ROCKTASCHEL T, RIEDEL S, et al. Language models as knowledge bases?[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 2463-2473.
[88]	ELSAHAR H, VOUGIOUSKLIS P, REMACI A, et al. T-REx: a large scale alignment of natural language with knowledge base triples[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, May 7-12, 2018: 3448-3452.
[89]	RAJPURKAR P, ZHANG J, LOPYTEV K, et al. SQuAD:100, 000+ questions for machine comprehension of text[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-4, 2016. Stroudsburg: ACL, 2016: 2383-2392.
[90]	ROBERTS A, RAFFEL C, SHAZEER N. How much know-ledge can you pack into the parameters of a language model?[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 5418-5426.
[91]	FERRADA S, BUSTOS B, HOGAN A. IMGpedia. a linked dataset with content-based analysis of Wikimedia images[C]// LNCS 10588: Proceedings of the 16th International Semantic Web Conference, Vienna, Oct 21-25, 2017. Cham: Springer, 2017: 84-93.
[92]	WANG M, WANG H F, QI G L, et al. Richpedia: a large-scale, comprehensive multi-modal knowledge graph[J]. Big Data Research, 2020, 22: 100159. DOI URL
[93]	刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261.
	LIU Z Y, SUN M S, LIN Y K, et al. Knowledge representation learning: a review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261.
[94]	官赛萍, 靳小龙, 贾岩涛, 等. 面向知识图谱的知识推理研究进展[J]. 软件学报, 2018, 29(10): 2966-2994.
	GUAN S P, JIN X L, JIA Y T, et al. Knowledge reasoning over knowledge graph: a survey[J]. Journal of Software, 2018, 29(10): 2966-2994.
[95]	DING X, LI Z, LIU T, et al. ELG: an event logic graph[J]. arXiv: 1907. 08015, 2019.
[96]	HAYASHI K, SHIMBO M. On the equivalence of holographic and complex embeddings for link prediction[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 554-559.

增强模型	基线模型	单句分类		自然语言推理任务				语义相似度任务			基线分数	模型分数	差值
增强模型	基线模型	CoLA (Mc)	SST-2 (Ac)	MNLI(m/mm)	QNLI (Ac)	RTE (Ac)	WNLI (Ac)	MRPC (Ac/F1)	STS-B (Pc/Sc)	QQP (Ac/F1)	基线分数	模型分数	差值
ERNIE 2.0_BASE	BERT_BASE	55.2	95.0	86.1/85.5	92.9	74.8	65.1	86.1/89.9	87.6/86.5	89.8/73.2	79.6	82.1	+2.5
ERNIE 2.0_LARGE	BERT_LARGE	63.5	95.6	88.7/88.8	94.6	80.2	67.8	87.4/90.2	91.2/90.6	90.1/73.8	82.1	84.8	+2.7
BERT-CS_base	BERT_BASE	54.3	93.6	84.7/83.9	91.2	69.5	—	—/95.9	—/86.4	—/72.1	79.6	81.3	+1.7
BERT-CS_large	BERT_LARGE	60.7	94.1	86.7/85.8	92.6	70.7	—	—/89.0	—/86.6	—/72.1	82.1	82.0	-0.1
SemBERT_BASE	BERT_BASE	57.8	93.5	84.4/84.0	90.9	69.3	90.9	—/88.2	71.8/—	—/87.3	79.6	81.8	+2.2
SemBERT_LARGE	BERT_LARGE	62.3	94.6	87.6/86.3	94.6	84.5	94.6	—/91.2	72.8/—	—/87.8	82.1	85.6	+3.5
CorefBERT_BASE	BERT_BASE	51.5	93.7	84.2/83.5	90.5	67.2	—	—/89.1	—/85.8	—/71.3	79.6	79.6	0
CorefBERT_LARGE	BERT_LARGE	62.0	94.7	86.9/85.7	92.9	70.0	—	—/89.3	—/86.3	—/71.7	82.1	82.2	+0.1
Thu-ERNIE	BERT_BASE	52.3	93.5	84.0/83.2	91.3	68.8	—	—/88.2	—/83.2	—/71.2	79.6	79.5	-0.1
LIBERT(2M)	BERT_BASE	35.3	90.8	79.9/78.8	87.2	63.6	—	86.6/81.7	82.6/—	69.3/88.2	75.3	76.7	+1.4
OM-ADAPT	BERT_BASE	53.5	93.4	84.2/83.7	90.6	68.2	—	—/87.9	—/85.9	—/71.1	79.6	79.8	+0.2
CN-ADAPT	BERT_BASE	49.8	93.9	84.2/83.3	90.6	69.7	—	—/88.9	—/85.8	—/71.6	79.6	79.8	+0.2
ERICA_BERT	BERT_BASE	57.9	92.8	84.5/84.7	90.7	69.6	—	—/89.5	—/89.5	—/88.3	83.0	83.1	+0.1
ERICA_RoBERTa	RoBERTa_BASE	63.5	95.0	87.5/87.5	92.6	78.5	—	—/91.5	—/90.7	—/91.6	86.4	86.5	+0.1
KEPLER-Wiki*	RoBERTa_BASE	63.6	94.5	87.2/86.5	92.4	85.2	—	—/89.3	—/91.2	—/91.7	86.4	86.8	+0.4
CoLAKE*	RoBERTa_BASE	63.4	94.6	87.4/87.2	92.4	77.9	—	—/90.9	—/90.8	—/92.0	86.4	86.3	-0.1

增强模型	基线模型	单句分类		自然语言推理任务				语义相似度任务			基线分数	模型分数	差值
增强模型	基线模型	CoLA (Mc)	SST-2 (Ac)	MNLI(m/mm)	QNLI (Ac)	RTE (Ac)	WNLI (Ac)	MRPC (Ac/F1)	STS-B (Pc/Sc)	QQP (Ac/F1)	基线分数	模型分数	差值
ERNIE 2.0_BASE	BERT_BASE	55.2	95.0	86.1/85.5	92.9	74.8	65.1	86.1/89.9	87.6/86.5	89.8/73.2	79.6	82.1	+2.5
ERNIE 2.0_LARGE	BERT_LARGE	63.5	95.6	88.7/88.8	94.6	80.2	67.8	87.4/90.2	91.2/90.6	90.1/73.8	82.1	84.8	+2.7
BERT-CS_base	BERT_BASE	54.3	93.6	84.7/83.9	91.2	69.5	—	—/95.9	—/86.4	—/72.1	79.6	81.3	+1.7
BERT-CS_large	BERT_LARGE	60.7	94.1	86.7/85.8	92.6	70.7	—	—/89.0	—/86.6	—/72.1	82.1	82.0	-0.1
SemBERT_BASE	BERT_BASE	57.8	93.5	84.4/84.0	90.9	69.3	90.9	—/88.2	71.8/—	—/87.3	79.6	81.8	+2.2
SemBERT_LARGE	BERT_LARGE	62.3	94.6	87.6/86.3	94.6	84.5	94.6	—/91.2	72.8/—	—/87.8	82.1	85.6	+3.5
CorefBERT_BASE	BERT_BASE	51.5	93.7	84.2/83.5	90.5	67.2	—	—/89.1	—/85.8	—/71.3	79.6	79.6	0
CorefBERT_LARGE	BERT_LARGE	62.0	94.7	86.9/85.7	92.9	70.0	—	—/89.3	—/86.3	—/71.7	82.1	82.2	+0.1
Thu-ERNIE	BERT_BASE	52.3	93.5	84.0/83.2	91.3	68.8	—	—/88.2	—/83.2	—/71.2	79.6	79.5	-0.1
LIBERT(2M)	BERT_BASE	35.3	90.8	79.9/78.8	87.2	63.6	—	86.6/81.7	82.6/—	69.3/88.2	75.3	76.7	+1.4
OM-ADAPT	BERT_BASE	53.5	93.4	84.2/83.7	90.6	68.2	—	—/87.9	—/85.9	—/71.1	79.6	79.8	+0.2
CN-ADAPT	BERT_BASE	49.8	93.9	84.2/83.3	90.6	69.7	—	—/88.9	—/85.8	—/71.6	79.6	79.8	+0.2
ERICA_BERT	BERT_BASE	57.9	92.8	84.5/84.7	90.7	69.6	—	—/89.5	—/89.5	—/88.3	83.0	83.1	+0.1
ERICA_RoBERTa	RoBERTa_BASE	63.5	95.0	87.5/87.5	92.6	78.5	—	—/91.5	—/90.7	—/91.6	86.4	86.5	+0.1
KEPLER-Wiki*	RoBERTa_BASE	63.6	94.5	87.2/86.5	92.4	85.2	—	—/89.3	—/91.2	—/91.7	86.4	86.8	+0.4
CoLAKE*	RoBERTa_BASE	63.4	94.6	87.4/87.2	92.4	77.9	—	—/90.9	—/90.8	—/92.0	86.4	86.3	-0.1

增强模型	基线模型	数据集	评测指标	基线得分/%	模型得分/%	差值/%
K-APDATER	RoBERTa_LARGE	LAMA-Google-RE	P@1	4.8	7.0	+2.2
K-APDATER	RoBERTa_LARGE	LAMA-UHN-Google-RE	P@1	2.5	3.7	+1.2
K-APDATER	RoBERTa_LARGE	LAMA-T-REx	P@1	27.1	29.1	+2.0
K-APDATER	RoBERTa_LARGE	LAMA-UHN-T-REx	P@1	20.1	23.0	+2.9
E-BERT-concat	BERT_BASE	LAMA-Google-RE+LAMA-T-REx	Hits@1	22.3	32.6	+10.3
E-BERT-concat	BERT_BASE	LAMA-UHN	Hits@1	20.2	31.1	+10.9
CoLAKE	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	9.5	+4.2
CoLAKE	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.9	+2.7
CoLAKE	RoBERTa_BASE	LAMA-T-REx	P@1	24.7	28.8	+4.1
CoLAKE	RoBERTa_BASE	LAMA-UHN-T-REx	P@1	17.0	20.4	+3.4
CALM	T5_Base	LAMA-ConceptNet	MRR	11.5	12.1	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@1	5.9	6.5	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@10	21.6	22.5	+0.9
KEPLER-Wiki	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	7.3	+2.0
KEPLER-Wiki	RoBERTa_BASE	LAMA-SQuAD	P@1	9.1	14.3	+5.2
KEPLER-W+W	RoBERTa_BASE	LAMA-ConceptNet	P@1	17.6	19.5	-1.9
KEPLER-W+W	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.1	+1.9
KALM	GPT-2	LAMA-Google-RE	P@1	4.9	5.4	+0.5
KALM	GPT-2	LAMA-T-REx	P@1	15.7	26.0	+10.3
KALM	GPT-2	LAMA-ConceptNet	P@1	9.7	10.7	+1.0
KALM	GPT-2	LAMA-SQuAD	P@1	5.9	11.9	+6.0
EaE	BERT_BASE	LAMA-ConceptNet	P@1	15.6	10.7	-4.9
EaE	BERT_BASE	LAMA-Google-RE	P@1	9.8	9.4	-0.4
EaE	BERT_BASE	LAMA-T-REx	P@1	31.1	37.4	+6.3
EaE	BERT_BASE	LAMA-SQuAD	P@1	14.1	22.4	+8.3

增强模型	基线模型	数据集	评测指标	基线得分/%	模型得分/%	差值/%
K-APDATER	RoBERTa_LARGE	LAMA-Google-RE	P@1	4.8	7.0	+2.2
K-APDATER	RoBERTa_LARGE	LAMA-UHN-Google-RE	P@1	2.5	3.7	+1.2
K-APDATER	RoBERTa_LARGE	LAMA-T-REx	P@1	27.1	29.1	+2.0
K-APDATER	RoBERTa_LARGE	LAMA-UHN-T-REx	P@1	20.1	23.0	+2.9
E-BERT-concat	BERT_BASE	LAMA-Google-RE+LAMA-T-REx	Hits@1	22.3	32.6	+10.3
E-BERT-concat	BERT_BASE	LAMA-UHN	Hits@1	20.2	31.1	+10.9
CoLAKE	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	9.5	+4.2
CoLAKE	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.9	+2.7
CoLAKE	RoBERTa_BASE	LAMA-T-REx	P@1	24.7	28.8	+4.1
CoLAKE	RoBERTa_BASE	LAMA-UHN-T-REx	P@1	17.0	20.4	+3.4
CALM	T5_Base	LAMA-ConceptNet	MRR	11.5	12.1	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@1	5.9	6.5	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@10	21.6	22.5	+0.9
KEPLER-Wiki	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	7.3	+2.0
KEPLER-Wiki	RoBERTa_BASE	LAMA-SQuAD	P@1	9.1	14.3	+5.2
KEPLER-W+W	RoBERTa_BASE	LAMA-ConceptNet	P@1	17.6	19.5	-1.9
KEPLER-W+W	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.1	+1.9
KALM	GPT-2	LAMA-Google-RE	P@1	4.9	5.4	+0.5
KALM	GPT-2	LAMA-T-REx	P@1	15.7	26.0	+10.3
KALM	GPT-2	LAMA-ConceptNet	P@1	9.7	10.7	+1.0
KALM	GPT-2	LAMA-SQuAD	P@1	5.9	11.9	+6.0
EaE	BERT_BASE	LAMA-ConceptNet	P@1	15.6	10.7	-4.9
EaE	BERT_BASE	LAMA-Google-RE	P@1	9.8	9.4	-0.4
EaE	BERT_BASE	LAMA-T-REx	P@1	31.1	37.4	+6.3
EaE	BERT_BASE	LAMA-SQuAD	P@1	14.1	22.4	+8.3

Review of Knowledge-Enhanced Pre-trained Language Models

知识增强型预训练语言模型综述

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 96

Related Articles 15

Recommended Articles

Metrics

[1]	YU Huilin, CHEN Wei, WANG Qi, GAO Jianwei, WAN Huaiyu. Knowledge Graph Link Prediction Based on Subgraph Reasoning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1800-1808.
[2]	SA Rina, LI Yanling, LIN Min. Survey of Question Answering Based on Knowledge Graph Reasoning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1727-1741.
[3]	TIAN Xuan, CHEN Hangxue. Survey on Applications of Knowledge Graph Embedding in Recommendation Tasks [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1681-1705.
[4]	XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610.
[5]	GUO Xiaowang, XIA Hongbin, LIU Yuan. Hybrid Recommendation Model of Knowledge Graph and Graph Convolutional Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1343-1353.
[6]	DONG Wenbo, SUN Shiliang, YIN Minzhi. Research and Development of Medical Knowledge Graph Reasoning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1193-1213.
[7]	WANG Baoliang, PAN Wencai. Two-Terminal Neighbor Information Fusion Recommendation Algorithm Based on Knowledge Graph [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1354-1361.
[8]	ZHANG Zichen, YUE Kun, QI Zhiwei, DUAN Liang. Incremental Construction of Time-Series Knowledge Graph [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 598-607.
[9]	CHEN Gongchi, RONG Huan, MA Tinghuai. Abstractive Text Summarization Model with Coherence Reinforcement and No Ground Truth Dependency [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 621-636.
[10]	LI Xiang, YANG Xingyao, YU Jiong, QIAN Yurong, ZHENG Jie. Double End Knowledge Graph Convolutional Networks for Recommender Systems [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 176-184.
[11]	ZHANG Chunpeng, GU Xiwu, LI Ruixuan, LI Yuhua, LIU Wei. Construction Method for Financial Personal Relationship Graphs Using BERT [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 137-143.
[12]	CHEN Deguang, MA Jinlin, MA Ziping, ZHOU Jie. Review of Pre-training Techniques for Natural Language Processing [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1359-1389.
[13]	WU Jiawei, SUN Yanchun. Recommendation System for Medical Consultation Integrating Knowledge Graph and Deep Learning Methods [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1432-1440.
[14]	GAO Yang, LIU Yuan. Recommendation Algorithm Combining Knowledge Graph and Short-Term Preferences [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1133-1144.
[15]	REN Jianhua, LI Jing, MENG Xiangfu. Document Classification Method Based on Context Awareness and Hierarchical Attention Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 305-314.