知识增强型预训练语言模型综述

doi:10.3778/j.issn.1673-9418.2108105

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (7): 1439-1461.DOI: 10.3778/j.issn.1673-9418.2108105

知识增强型预训练语言模型综述

韩毅¹, 乔林波², 李东升², 廖湘科²^,⁺()

1.国防科技大学气象海洋学院,长沙 410073
2.国防科学大学计算机学院,长沙 410073

收稿日期:2021-08-30 修回日期:2022-03-22 出版日期:2022-07-01 发布日期:2022-07-25
作者简介:韩毅（1993—）,男,山东青岛人,博士,讲师,主要研究方向为自然语言处理、知识图谱等。
HAN Yi, born in 1993, Ph.D., lecturer. His research interests include natural language processing, knowledge graph, etc.
乔林波（1987—）,男,重庆万州人,博士,助理研究员,主要研究方向为结构化稀疏学习、分布式优化、深度学习等。
QIAO Linbo, born in 1987, Ph.D., research assistant. His research interests include structured sparse learning, distributed optimization, deep learning, etc.
李东升（1978—）,男,安徽桐城人,博士,研究员,博士生导师,主要研究方向为并行与分布式计算、云计算、大规模数据管理等。
LI Dongsheng, born in 1978, Ph.D., professor, Ph.D. supervisor. His research interests include parallel and distributed computing, cloud computing, large-scale data management, etc.
廖湘科（1963—）,男,湖南涟源人,博士,研究员,博士生导师,主要研究方向为并行与分布式计算、高性能计算机系统、操作系统等。
LIAO Xiangke, born in 1963, Ph.D., professor, Ph.D. supervisor. His research interests include parallel and distributed computing, high-performance computer systems, operating systems, etc.
基金资助:
并行与分布处理重点实验室（PDL）科技开放基金(6142110200203);并行与分布处理重点实验室（PDL）科技开放基金(WDZC20205500101)

Review of Knowledge-Enhanced Pre-trained Language Models

HAN Yi¹, QIAO Linbo², LI Dongsheng², LIAO Xiangke²^,⁺()

1. College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China
2. College of Computer, National University of Defense Technology, Changsha 410073, China

Received:2021-08-30 Revised:2022-03-22 Online:2022-07-01 Published:2022-07-25
Supported by:
the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL)(6142110200203);the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL)(WDZC20205500101)

摘要/Abstract

摘要：

知识增强型预训练语言模型旨在利用知识图谱中的结构化知识来强化预训练语言模型,使之既能学习到自由文本中的通用语义知识,又能够学习到文本背后的现实实体知识,从而有效应对下游知识驱动型任务。虽然该方向研究潜力巨大,但相关工作目前尚处初期探索阶段,并未出现全面的总结和系统的梳理。为填补该方向综述性文章的空白,在归纳整理大量相关文献的基础上,首先从引入知识的原因、引入知识的优势、引入知识的难点三方面说明了知识增强型预训练语言模型产生的背景信息,总结了其中涉及的基本概念;随后列举了利用知识扩充输入特征、利用知识改进模型架构以及利用知识约束训练任务等三大类知识增强方法;最后统计了各类知识增强型预训练语言模型在评估任务上的得分情况,分析了知识增强模型的性能指标、目前面临的困难挑战以及未来可能的发展方向。

关键词: 知识图谱, 预训练语言模型, 自然语言处理

Abstract:

The knowledge-enhanced pre-trained language models attempt to use the structured knowledge stored in the knowledge graph to strengthen the pre-trained language models, so that they can learn not only the general semantic knowledge from the free text, but also the factual entity knowledge behind the text. In this way, the enhanced models can effectively solve downstream knowledge-driven tasks. Although this is a promising research direction, the current works are still in the exploratory stage, and there is no comprehensive summary and systematic arrangement. This paper aims to address the lack of comprehensive reviews of this direction. To this end, on the basis of summarizing and sorting out a large number of relevant works, this paper firstly explains the background information from three aspects: the reasons, the advantages, and the difficulties of introducing knowledge, summarizes the basic concepts involved in the knowledge-enhanced pre-trained language models. Then, it discusses three types of knowledge enhancement methods: using knowledge to expand input features, using knowledge to modify model architecture, and using knowledge to constrain training tasks. Finally, it counts the scores of various knowledge enhanced pre-trained language models on several evaluation tasks, analyzes the performance, the current challenges, and possible future directions of knowledge-enhanced pre-trained language models.

Key words: knowledge graph, pre-trained language models, natural language processing

中图分类号:

TP18

韩毅, 乔林波, 李东升, 廖湘科. 知识增强型预训练语言模型综述[J]. 计算机科学与探索, 2022, 16(7): 1439-1461.

HAN Yi, QIAO Linbo, LI Dongsheng, LIAO Xiangke. Review of Knowledge-Enhanced Pre-trained Language Models[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1439-1461.

图/表 7

参考文献 96

[1]	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[2]	BIANCHI F, HOVY D. On the gap between adoption and understanding in NLP[C]// Findings of the Association for Computational Linguistics, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 3895-3901.
[3]	索传军. 网络信息资源组织研究的新视角[J]. 图书馆情报工作, 2013, 57(7): 5-12.
	SUO C J. A new perspective for web resource organization research[J]. Library and Information Service, 2013, 57(7): 5-12.
[4]	LOGAN IV R L, LIU N F, PETERS M E, et al. Barack’s wife Hillary: using knowledge graphs for fact-aware language modeling[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 5962-5971.
[5]	JI S X, PAN S R, CAMBRIA E, et al. A survey on knowledge graphs: representation, acquisition, and applications[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514. DOI URL
[6]	ZHANG Z Y, HAN X, LIU Z Y, et al. ERNIE: enhanced language representation with informative entities[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 1441-1451.
[7]	SU Y S, HAN X, ZHANG Z Y, et al. CokeBERT: contextual knowledge selection and embedding towards enhanced pre-trained language models[J]. AI Open, 2021, 2: 127-134. DOI URL
[8]	MCCLOSKEY M, COHEN N J. Catastrophic interference in connectionist networks: the sequential learning problem[J]. Psychology of Learning and Motivation, 1989, 24: 109-165.
[9]	刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600.
	LIU Q, LI Y, DUAN H, et al. Knowledge graph construction techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600.
[10]	AMIT S. Introducing the knowledge graph[R]. America: Official Blog of Google, 2012.
[11]	MAHDISOLTANI F, BIEGA J, SUCHANEK F M. YAGO3: a knowledge base from multilingual wikipedias[C]// Procee-dings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, Jan 4-7, 2015: 1-11.
[12]	VRANDEČIĆ D, KRÖTZSCH M. Wikidata: a free collaborative knowledgebase[J]. Communications of the ACM, 2014, 57(10): 78-85.
[13]	BOLLACKER K D, COOK R P, TUFTS P. Freebase: a shared database of structured general human knowledge[C]// Proceedings of the 22nd AAAI Conference on A.pngicial Intelligence, Vancouver, Jul 22-26, 2007. Menlo Park: AAAI, 2007: 1962-1963.
[14]	BIZER C, LEHMANN J, KOBILAROV G, et al. DBpedia—a crystallization point for the web of data[J]. Journal of Web Semantics, 2009, 7(3): 154-165. DOI URL
[15]	SWARTZ A. MusicBrainz: a semantic web service[J]. IEEE Intelligent Systems, 2002, 17(1): 76-77.
[16]	AHLERS D. Assessment of the accuracy of GeoNames gazetteer data[C]// Proceedings of the 7th Workshop on Geographic Information Retrieval, Orlando, Nov 5, 2013. New York: ACM, 2013: 74-81.
[17]	WISHART D S, KNOX C, GUO A C, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration[J]. Nucleic Acids Research, 2006, 34: 668-672.
[18]	WANG X Z, GAO T Y, ZHU Z C, et al. KEPLER: a unified model for knowledge embedding and pre-trained language representation[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 176-194. DOI URL
[19]	HAO J, CHEN M, YU W, et al. Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, Aug 4-8, 2019. New York: ACM, 2019: 1709-1719.
[20]	CAMBRIA E, SONG Y Q, WANG H X, et al. Semantic multidimensional scaling for open-domain sentiment analysis[J]. IEEE Intelligent Systems, 2014, 29(2): 44-51.
[21]	徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
	XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.
[22]	XIONG C Y, POWER R, CALLAN J. Explicit semantic ranking for academic search via knowledge graph embedding[C]// Proceedings of the 26th International Conference on World Wide Web, Perth, Apr 3-7, 2017. New York: ACM, 2017: 1271-1279.
[23]	DAI Z H, LI L, XU W. CFO: conditional focused neural question answering with large-scale knowledge bases[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 800-810.
[24]	ZHANG F Z, YUAN N J, LIAN D F, et al. Collaborative knowledge base embedding for recommender systems[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 353-362.
[25]	LIU Z B, NIU Z Y, WU H, et al. Knowledge aware conversation generation with explainable reasoning over augmented graphs[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 1782-1792.
[26]	SPEER R, HAVASI C. ConceptNet 5: a large semantic network for relational knowledge[M]// GUREVYCHI, KIMJ. The People’s Web Meets NLP. Berlin, Heidelberg: Springer, 2013.
[27]	MILLER G A. WordNet: an electronic lexical database[M]. Cambridge: MIT Press, 1998.
[28]	QIU X P, SUN T X, XU Y G, et al. Pre-trained models for natural language processing: a survey[J]. Science China Technological Sciences, 2020, 63: 1872-1897. DOI URL
[29]	TURIAN J P, RATINOV L A, BENGIO Y. Word representations: a simple and general method for semi-supervised lear-ning[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Jul 11-16, 2010. Stroudsburg: ACL, 2010: 384-394.
[30]	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Advances in Neural Information Processing Systems 26, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 3111-3119.
[31]	PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1532-1543.
[32]	PETERS M E, NEUMANN M, LYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, New Orleans, Jun 1-6, 2018. Stroudsburg: ACL, 2018: 2227-2237.
[33]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2020-09-26]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[34]	VASWANI A, SHAZEER N, PARMARN, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.
[35]	KURSUNCU U, GAUR M, SHETH A. Knowledge infused learning (K-IL): towards deep incorporation of knowledge in deep learning[J]. arXiv: 1912. 00512, 2019.
[36]	QIN Y, LIN Y, TAKANOBU R, et al. ERICA: improving entity and relation understanding for pre-trained language models via contrastive learning[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 3350-3363.
[37]	XIONG W H, DU J F, WANG W Y, et al. Pretrained encyclopedia: weakly supervised knowledge-pretrained language model[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apl 26-30, 2020: 1-22.
[38]	ROSSET C, XIONG C, PHAN M, et al. Knowledge-aware language model pretraining[J]. arXiv: 2007. 00655, 2020.
[39]	XU S, LI H R, YUAN P, et al. K-PLUG: knowledge-injected pre-trained language model for natural language understanding and generation in E-commerce[C]// Findings of the Association for Computational Linguistics, Punta Cana, Nov 16-20, 2021. Stroudsburg: ACL, 2021: 1-17.
[40]	LIU W J, ZHOU P, ZHAO Z, et al. K-BERT: enabling language representation with knowledge graph[C]// Proceedings of the 2020 AAAI Conference on A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 2901-2908.
[41]	SUN T, SHAO Y, QIU X, et al. CoLAKE: contextualized language and knowledge embedding[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 3660-3670.
[42]	YAMADA I, ASAI A, SHINDO H, et al. LUKE: deep contextualized entity representations with entity-aware self-attention[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 6442-6454.
[43]	PÖRNER N, WALTINGER U, SCHÜTZE H. E-BERT: efficient-yet-effective entity embeddings for BERT[C]// Findings of the Association for Computational Linguistics, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 803-818.
[44]	LIU A L, DU J F, STOYANOV V. Knowledge-augmented language model and its application to unsupervised named-entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 1142-1150.
[45]	LIU X, YIN D, ZHANG X, et al. OAG-BERT: pre-train heterogeneous entity-augmented academic language models[J]. arXiv: 2103. 02410, 2021.
[46]	ZHANG F J, LIU X, TANG J, et al. OAG: toward linking large-scale heterogeneous entity graphs[C]// Proceedings of the 25th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, Anchorage, Aug 4-8, 2019. New York: ACM, 2019: 2585-2595.
[47]	LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[48]	BELTAGY I, LO K, COHAN A. SciBERT: a pretrained language model for scie.pngic text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 3613-3618.
[49]	LO K, WANGL L L, NEUMANN M, et al. S2ORC: the semantic scholar open research corpus[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 4969-4983.
[50]	LAUSHER A, VULIC I, PONTI E M, et al. Specializing unsupervised pretraining models for word-level semantic similarity[C]// Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Dec 8-13, 2020: 1371-1383.
[51]	ZHANG Z S, WU Y W, ZHAO H, et al. Semantics-aware BERT for language understanding[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 9628-9635.
[52]	YE Z X, CHEN Q, WANG W, et al. Align, mask and select: a simple method for incorporating commonsense knowledge into language representation models[J]. arXiv: 1908. 06725, 2019.
[53]	CHEN W, SU Y, YAN X, et al. KGPT: knowledge-grounded pre-training for data-to-text generation[J]. arXiv: 2010. 02307, 2020.
[54]	RASHKIN H, SAP M, ALLAWAYE, et al. Event2Mind:commonsense inference on events, intents, and reactions[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2020. Stroudsburg: ACL, 2020: 8635-8648.
[55]	SAP M, LE BRAS R, ALLAWAY E, et al. ATOMIC: an atlas of machine commonsense for if-then reasoning[C]// Proceedings of the 33rd AAAI Conference on A.pngicial Intelligence, the 31st Innovative Applications of A.pngicial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in A.pngicial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 3027-3035.
[56]	BOSSELUT A, RASHKIN H, SAP M, et al. COMET: commonsense transformers for automatic knowledge graph construction[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 4762-4779.
[57]	BORDES A, USUNIER N, GARCÍA-DURÁN A, et al. Translating embeddings for modeling multi-relational data[C]// Advances in Neural Information Processing Systems 26, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 2787-2795.
[58]	HE B, ZHOU D, XIAO J H, et al. Integrating graph contextualized knowledge into pre-trained language models[C]// Findings of the Association for Computational Linguistics, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 2281-2290.
[59]	SUN Y, WANG S, FENG S, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation[J]. arXiv: 2107. 02137, 2021.
[60]	PETERS M E, NEUMANN M, LOGAN IV R L, et al. Knowledge enhanced contextual word representations[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 43-54.
[61]	YU D, ZHU C, YANG Y, et al. Jaket: joint pre-training of knowledge graph and language understanding[J]. arXiv: 2010. 00796, 2020.
[62]	WANG R, TANG D, DUAN N, et al. K-adapter: infusing knowledge into pre-trained models with adapters[J]. arXiv: 2002. 01808, 2020.
[63]	JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure of language?[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 3651-3657.
[64]	LAUSCHER A, MAJEWSKA O, RIBEIRO F R, et al. Common sense or world knowledge? Investigating adapter-based knowledge injection into pretrained transformers[J]. arXiv: 2005. 11787, 2020.
[65]	GUU K, LEE K, TUNG Z, et al. Retrieval augmented language model pre-training[C]// Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 3929-3938.
[66]	YANG Z L, DAI Z H, YANG Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 5754-5764.
[67]	LEWIS M, LIU Y H, GOYAL N, et al. BART:denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 7871-7880.
[68]	LIU Y, OTT M, GPYAL N, et al. RoBERTA: a robustly optimized BERT pretraining approach[J]. arXiv: 1907. 11692, 2019.
[69]	JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT: improving pre-training by representing and predicting spans[J]. Transactions of Association for Computer Linguistics, 2020, 8: 64-77. DOI URL
[70]	WANG W, BI B, YAN M, et al. StructBERT: incorporating language structures into pre-training for deep language understanding[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020: 1-10.
[71]	CLARK K, LUONG M T, LE Q V, et al. ELECTRA: pre-training text encoder as discriminators rather than generators[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020: 1-15.
[72]	SUN Y, WANG S, LI Y, et al. ERNIE: enhanced representation through knowledge integration[J]. arXiv: 1904. 09223, 2019.
[73]	LEVINE Y, LENZ B, DAGAN O, et al. SenseBERT: driving some sense into BERT[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 4656-4667.
[74]	YE D M, LIN Y K, DU J J, et al. Coreferential reasoning learning for language representation[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 7170-7186.
[75]	MINTZ M, BILLS S, SNOW R, et al. Distant supervision for relation extraction without labeled data[C]// Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, Aug 2-7, 2009. Stroudsburg: ACL, 2009: 1003-1011.
[76]	李涓子, 侯磊. 知识图谱研究综述[J]. 山西大学学报(自然科学版), 2017, 40(3): 454-459.
	LI J Z, HOU L. Reviews on knowledge graph research[J]. Journal of Shanxi University (Natural Science Edition), 2017, 40(3): 454-459.
[77]	WANG Z, ZHANG J W, FENG J L, et al. Knowledge graph embedding by translating on hyperplanes[C]// Proceedings of the 28th AAAI Conference on A.pngicial Intelligence, Québec City, Jul 27-31, 2014. Menlo Park: AAAI, 2014: 1112-1119.
[78]	JI G L, HE S Z, XU L H, et al. Knowledge graph embedding via dynamic mapping matrix[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, Jul 26-31, 2015. Stroudsburg: ACL, 2015: 687-696.
[79]	FÉVRY T, SOARES L B, FITZGERALD N, et al. Entities as experts: sparse memory access with entity supervision[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 4937-4951.
[80]	VERGA P, SUN H, SOARES L B, et al. Facts as experts: adaptable and interpretable neural memory over symbolic knowledge[J]. arXiv: 2007. 00849, 2020.
[81]	SOARES L B, FITZGERALD N, LING J, et al. Matching the blanks: distributional similarity for relation learning[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28- Aug 2, 2019. Stroudsburg: ACL, 2019: 2895-2905.
[82]	PENG H, GAO T Y, HAN X, et al. Learning from context or names? An empirical study on neural relation extraction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 3661-3672.
[83]	SUN Y, WANG S H, LI Y K, et al. ERNIE 2.0: a continual pre-training framework for language understanding[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 8968-8975.
[84]	ZHOU W, LEE D H, SELVAM R K, et al. Pre-training text-to-text transformers for concept-centric common sense[C]// Proceedings of the 9th International Conference on Learning Representations, Austria, May 3-7, 2021: 1-15.
[85]	RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21: 140.
[86]	WANG A, SINGH A, MICHAEL J, et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding[C]// Proceedings of the 7th International Conference on Learning Representations, New Orleans, May 6-9, 2019: 1-20.
[87]	PETRONI F, ROCKTASCHEL T, RIEDEL S, et al. Language models as knowledge bases?[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 2463-2473.
[88]	ELSAHAR H, VOUGIOUSKLIS P, REMACI A, et al. T-REx: a large scale alignment of natural language with knowledge base triples[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, May 7-12, 2018: 3448-3452.
[89]	RAJPURKAR P, ZHANG J, LOPYTEV K, et al. SQuAD:100, 000+ questions for machine comprehension of text[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-4, 2016. Stroudsburg: ACL, 2016: 2383-2392.
[90]	ROBERTS A, RAFFEL C, SHAZEER N. How much know-ledge can you pack into the parameters of a language model?[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 5418-5426.
[91]	FERRADA S, BUSTOS B, HOGAN A. IMGpedia. a linked dataset with content-based analysis of Wikimedia images[C]// LNCS 10588: Proceedings of the 16th International Semantic Web Conference, Vienna, Oct 21-25, 2017. Cham: Springer, 2017: 84-93.
[92]	WANG M, WANG H F, QI G L, et al. Richpedia: a large-scale, comprehensive multi-modal knowledge graph[J]. Big Data Research, 2020, 22: 100159. DOI URL
[93]	刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261.
	LIU Z Y, SUN M S, LIN Y K, et al. Knowledge representation learning: a review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261.
[94]	官赛萍, 靳小龙, 贾岩涛, 等. 面向知识图谱的知识推理研究进展[J]. 软件学报, 2018, 29(10): 2966-2994.
	GUAN S P, JIN X L, JIA Y T, et al. Knowledge reasoning over knowledge graph: a survey[J]. Journal of Software, 2018, 29(10): 2966-2994.
[95]	DING X, LI Z, LIU T, et al. ELG: an event logic graph[J]. arXiv: 1907. 08015, 2019.
[96]	HAYASHI K, SHIMBO M. On the equivalence of holographic and complex embeddings for link prediction[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 554-559.

增强模型	基线模型	单句分类		自然语言推理任务				语义相似度任务			基线分数	模型分数	差值
增强模型	基线模型	CoLA (Mc)	SST-2 (Ac)	MNLI(m/mm)	QNLI (Ac)	RTE (Ac)	WNLI (Ac)	MRPC (Ac/F1)	STS-B (Pc/Sc)	QQP (Ac/F1)	基线分数	模型分数	差值
ERNIE 2.0_BASE	BERT_BASE	55.2	95.0	86.1/85.5	92.9	74.8	65.1	86.1/89.9	87.6/86.5	89.8/73.2	79.6	82.1	+2.5
ERNIE 2.0_LARGE	BERT_LARGE	63.5	95.6	88.7/88.8	94.6	80.2	67.8	87.4/90.2	91.2/90.6	90.1/73.8	82.1	84.8	+2.7
BERT-CS_base	BERT_BASE	54.3	93.6	84.7/83.9	91.2	69.5	—	—/95.9	—/86.4	—/72.1	79.6	81.3	+1.7
BERT-CS_large	BERT_LARGE	60.7	94.1	86.7/85.8	92.6	70.7	—	—/89.0	—/86.6	—/72.1	82.1	82.0	-0.1
SemBERT_BASE	BERT_BASE	57.8	93.5	84.4/84.0	90.9	69.3	90.9	—/88.2	71.8/—	—/87.3	79.6	81.8	+2.2
SemBERT_LARGE	BERT_LARGE	62.3	94.6	87.6/86.3	94.6	84.5	94.6	—/91.2	72.8/—	—/87.8	82.1	85.6	+3.5
CorefBERT_BASE	BERT_BASE	51.5	93.7	84.2/83.5	90.5	67.2	—	—/89.1	—/85.8	—/71.3	79.6	79.6	0
CorefBERT_LARGE	BERT_LARGE	62.0	94.7	86.9/85.7	92.9	70.0	—	—/89.3	—/86.3	—/71.7	82.1	82.2	+0.1
Thu-ERNIE	BERT_BASE	52.3	93.5	84.0/83.2	91.3	68.8	—	—/88.2	—/83.2	—/71.2	79.6	79.5	-0.1
LIBERT(2M)	BERT_BASE	35.3	90.8	79.9/78.8	87.2	63.6	—	86.6/81.7	82.6/—	69.3/88.2	75.3	76.7	+1.4
OM-ADAPT	BERT_BASE	53.5	93.4	84.2/83.7	90.6	68.2	—	—/87.9	—/85.9	—/71.1	79.6	79.8	+0.2
CN-ADAPT	BERT_BASE	49.8	93.9	84.2/83.3	90.6	69.7	—	—/88.9	—/85.8	—/71.6	79.6	79.8	+0.2
ERICA_BERT	BERT_BASE	57.9	92.8	84.5/84.7	90.7	69.6	—	—/89.5	—/89.5	—/88.3	83.0	83.1	+0.1
ERICA_RoBERTa	RoBERTa_BASE	63.5	95.0	87.5/87.5	92.6	78.5	—	—/91.5	—/90.7	—/91.6	86.4	86.5	+0.1
KEPLER-Wiki*	RoBERTa_BASE	63.6	94.5	87.2/86.5	92.4	85.2	—	—/89.3	—/91.2	—/91.7	86.4	86.8	+0.4
CoLAKE*	RoBERTa_BASE	63.4	94.6	87.4/87.2	92.4	77.9	—	—/90.9	—/90.8	—/92.0	86.4	86.3	-0.1

增强模型	基线模型	单句分类		自然语言推理任务				语义相似度任务			基线分数	模型分数	差值
增强模型	基线模型	CoLA (Mc)	SST-2 (Ac)	MNLI(m/mm)	QNLI (Ac)	RTE (Ac)	WNLI (Ac)	MRPC (Ac/F1)	STS-B (Pc/Sc)	QQP (Ac/F1)	基线分数	模型分数	差值
ERNIE 2.0_BASE	BERT_BASE	55.2	95.0	86.1/85.5	92.9	74.8	65.1	86.1/89.9	87.6/86.5	89.8/73.2	79.6	82.1	+2.5
ERNIE 2.0_LARGE	BERT_LARGE	63.5	95.6	88.7/88.8	94.6	80.2	67.8	87.4/90.2	91.2/90.6	90.1/73.8	82.1	84.8	+2.7
BERT-CS_base	BERT_BASE	54.3	93.6	84.7/83.9	91.2	69.5	—	—/95.9	—/86.4	—/72.1	79.6	81.3	+1.7
BERT-CS_large	BERT_LARGE	60.7	94.1	86.7/85.8	92.6	70.7	—	—/89.0	—/86.6	—/72.1	82.1	82.0	-0.1
SemBERT_BASE	BERT_BASE	57.8	93.5	84.4/84.0	90.9	69.3	90.9	—/88.2	71.8/—	—/87.3	79.6	81.8	+2.2
SemBERT_LARGE	BERT_LARGE	62.3	94.6	87.6/86.3	94.6	84.5	94.6	—/91.2	72.8/—	—/87.8	82.1	85.6	+3.5
CorefBERT_BASE	BERT_BASE	51.5	93.7	84.2/83.5	90.5	67.2	—	—/89.1	—/85.8	—/71.3	79.6	79.6	0
CorefBERT_LARGE	BERT_LARGE	62.0	94.7	86.9/85.7	92.9	70.0	—	—/89.3	—/86.3	—/71.7	82.1	82.2	+0.1
Thu-ERNIE	BERT_BASE	52.3	93.5	84.0/83.2	91.3	68.8	—	—/88.2	—/83.2	—/71.2	79.6	79.5	-0.1
LIBERT(2M)	BERT_BASE	35.3	90.8	79.9/78.8	87.2	63.6	—	86.6/81.7	82.6/—	69.3/88.2	75.3	76.7	+1.4
OM-ADAPT	BERT_BASE	53.5	93.4	84.2/83.7	90.6	68.2	—	—/87.9	—/85.9	—/71.1	79.6	79.8	+0.2
CN-ADAPT	BERT_BASE	49.8	93.9	84.2/83.3	90.6	69.7	—	—/88.9	—/85.8	—/71.6	79.6	79.8	+0.2
ERICA_BERT	BERT_BASE	57.9	92.8	84.5/84.7	90.7	69.6	—	—/89.5	—/89.5	—/88.3	83.0	83.1	+0.1
ERICA_RoBERTa	RoBERTa_BASE	63.5	95.0	87.5/87.5	92.6	78.5	—	—/91.5	—/90.7	—/91.6	86.4	86.5	+0.1
KEPLER-Wiki*	RoBERTa_BASE	63.6	94.5	87.2/86.5	92.4	85.2	—	—/89.3	—/91.2	—/91.7	86.4	86.8	+0.4
CoLAKE*	RoBERTa_BASE	63.4	94.6	87.4/87.2	92.4	77.9	—	—/90.9	—/90.8	—/92.0	86.4	86.3	-0.1

增强模型	基线模型	数据集	评测指标	基线得分/%	模型得分/%	差值/%
K-APDATER	RoBERTa_LARGE	LAMA-Google-RE	P@1	4.8	7.0	+2.2
K-APDATER	RoBERTa_LARGE	LAMA-UHN-Google-RE	P@1	2.5	3.7	+1.2
K-APDATER	RoBERTa_LARGE	LAMA-T-REx	P@1	27.1	29.1	+2.0
K-APDATER	RoBERTa_LARGE	LAMA-UHN-T-REx	P@1	20.1	23.0	+2.9
E-BERT-concat	BERT_BASE	LAMA-Google-RE+LAMA-T-REx	Hits@1	22.3	32.6	+10.3
E-BERT-concat	BERT_BASE	LAMA-UHN	Hits@1	20.2	31.1	+10.9
CoLAKE	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	9.5	+4.2
CoLAKE	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.9	+2.7
CoLAKE	RoBERTa_BASE	LAMA-T-REx	P@1	24.7	28.8	+4.1
CoLAKE	RoBERTa_BASE	LAMA-UHN-T-REx	P@1	17.0	20.4	+3.4
CALM	T5_Base	LAMA-ConceptNet	MRR	11.5	12.1	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@1	5.9	6.5	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@10	21.6	22.5	+0.9
KEPLER-Wiki	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	7.3	+2.0
KEPLER-Wiki	RoBERTa_BASE	LAMA-SQuAD	P@1	9.1	14.3	+5.2
KEPLER-W+W	RoBERTa_BASE	LAMA-ConceptNet	P@1	17.6	19.5	-1.9
KEPLER-W+W	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.1	+1.9
KALM	GPT-2	LAMA-Google-RE	P@1	4.9	5.4	+0.5
KALM	GPT-2	LAMA-T-REx	P@1	15.7	26.0	+10.3
KALM	GPT-2	LAMA-ConceptNet	P@1	9.7	10.7	+1.0
KALM	GPT-2	LAMA-SQuAD	P@1	5.9	11.9	+6.0
EaE	BERT_BASE	LAMA-ConceptNet	P@1	15.6	10.7	-4.9
EaE	BERT_BASE	LAMA-Google-RE	P@1	9.8	9.4	-0.4
EaE	BERT_BASE	LAMA-T-REx	P@1	31.1	37.4	+6.3
EaE	BERT_BASE	LAMA-SQuAD	P@1	14.1	22.4	+8.3

增强模型	基线模型	数据集	评测指标	基线得分/%	模型得分/%	差值/%
K-APDATER	RoBERTa_LARGE	LAMA-Google-RE	P@1	4.8	7.0	+2.2
K-APDATER	RoBERTa_LARGE	LAMA-UHN-Google-RE	P@1	2.5	3.7	+1.2
K-APDATER	RoBERTa_LARGE	LAMA-T-REx	P@1	27.1	29.1	+2.0
K-APDATER	RoBERTa_LARGE	LAMA-UHN-T-REx	P@1	20.1	23.0	+2.9
E-BERT-concat	BERT_BASE	LAMA-Google-RE+LAMA-T-REx	Hits@1	22.3	32.6	+10.3
E-BERT-concat	BERT_BASE	LAMA-UHN	Hits@1	20.2	31.1	+10.9
CoLAKE	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	9.5	+4.2
CoLAKE	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.9	+2.7
CoLAKE	RoBERTa_BASE	LAMA-T-REx	P@1	24.7	28.8	+4.1
CoLAKE	RoBERTa_BASE	LAMA-UHN-T-REx	P@1	17.0	20.4	+3.4
CALM	T5_Base	LAMA-ConceptNet	MRR	11.5	12.1	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@1	5.9	6.5	+0.6
CALM	T5_Base	LAMA-ConceptNet	P@10	21.6	22.5	+0.9
KEPLER-Wiki	RoBERTa_BASE	LAMA-Google-RE	P@1	5.3	7.3	+2.0
KEPLER-Wiki	RoBERTa_BASE	LAMA-SQuAD	P@1	9.1	14.3	+5.2
KEPLER-W+W	RoBERTa_BASE	LAMA-ConceptNet	P@1	17.6	19.5	-1.9
KEPLER-W+W	RoBERTa_BASE	LAMA-UHN-Google-RE	P@1	2.2	4.1	+1.9
KALM	GPT-2	LAMA-Google-RE	P@1	4.9	5.4	+0.5
KALM	GPT-2	LAMA-T-REx	P@1	15.7	26.0	+10.3
KALM	GPT-2	LAMA-ConceptNet	P@1	9.7	10.7	+1.0
KALM	GPT-2	LAMA-SQuAD	P@1	5.9	11.9	+6.0
EaE	BERT_BASE	LAMA-ConceptNet	P@1	15.6	10.7	-4.9
EaE	BERT_BASE	LAMA-Google-RE	P@1	9.8	9.4	-0.4
EaE	BERT_BASE	LAMA-T-REx	P@1	31.1	37.4	+6.3
EaE	BERT_BASE	LAMA-SQuAD	P@1	14.1	22.4	+8.3

知识增强型预训练语言模型综述

Review of Knowledge-Enhanced Pre-trained Language Models

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 96

相关文章 15

编辑推荐

Metrics

[1]	于慧琳, 陈炜, 王琪, 高建伟, 万怀宇. 使用子图推理实现知识图谱关系预测[J]. 计算机科学与探索, 2022, 16(8): 1800-1808.
[2]	萨日娜, 李艳玲, 林民. 知识图谱推理问答研究综述[J]. 计算机科学与探索, 2022, 16(8): 1727-1741.
[3]	田萱, 陈杭雪. 推荐任务中知识图谱嵌入应用研究综述[J]. 计算机科学与探索, 2022, 16(8): 1681-1705.
[4]	夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610.
[5]	郭晓旺, 夏鸿斌, 刘渊. 融合知识图谱与图卷积网络的混合推荐模型[J]. 计算机科学与探索, 2022, 16(6): 1343-1353.
[6]	董文波, 孙仕亮, 殷敏智. 医学知识推理研究现状与发展[J]. 计算机科学与探索, 2022, 16(6): 1193-1213.
[7]	王宝亮, 潘文采. 基于知识图谱的双端邻居信息融合推荐算法[J]. 计算机科学与探索, 2022, 16(6): 1354-1361.
[8]	张子辰, 岳昆, 祁志卫, 段亮. 时序知识图谱的增量构建[J]. 计算机科学与探索, 2022, 16(3): 598-607.
[9]	陈共驰, 荣欢, 马廷淮. 面向连贯性强化的无真值依赖文本摘要模型[J]. 计算机科学与探索, 2022, 16(3): 621-636.
[10]	李想, 杨兴耀, 于炯, 钱育蓉, 郑捷. 基于知识图谱卷积网络的双端推荐算法[J]. 计算机科学与探索, 2022, 16(1): 176-184.
[11]	张纯鹏, 辜希武, 李瑞轩, 李玉华, 刘伟. BERT辅助金融领域人物关系图谱构建[J]. 计算机科学与探索, 2022, 16(1): 137-143.
[12]	陈德光, 马金林, 马自萍, 周洁. 自然语言处理预训练技术综述[J]. 计算机科学与探索, 2021, 15(8): 1359-1389.
[13]	武家伟, 孙艳春. 融合知识图谱和深度学习方法的问诊推荐系统[J]. 计算机科学与探索, 2021, 15(8): 1432-1440.
[14]	高仰, 刘渊. 融合知识图谱和短期偏好的推荐算法[J]. 计算机科学与探索, 2021, 15(6): 1133-1144.
[15]	任建华, 李静, 孟祥福. 上下文感知与层级注意力网络的文档分类方法[J]. 计算机科学与探索, 2021, 15(2): 305-314.