Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (4): 713-733.DOI: 10.3778/j.issn.1673-9418.2107114

• Surveys and Frontiers • Previous Articles     Next Articles

Survey of Supervised Joint Entity Relation Extraction Methods

ZHANG Shaowei1, WANG Xin1,2,+(), CHEN Zirui1, WANG Lin3, XU Dawei3, JIA Yongzhe1,3   

  1. 1. College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
    2. Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, China
    3. Tianjin TechFantasy Co., Ltd., Tianjin 300457, China
  • Received:2021-07-21 Revised:2022-03-04 Online:2022-04-01 Published:2022-04-14
  • About author:ZHANG Shaowei, born in 1996, M.S. candidate. His research interests include knowledge repre-sentation learning and knowledge graph construc-tion.
    WANG Xin, born in 1981, Ph.D., professor, Ph.D. supervisor, distinguished member of CCF. His research interests include knowledge graphs, graph databases and big data distributed processing.
    CHEN Zirui, born in 1998, M.S. candidate, student member of CCF. His research interests include knowledge representation learning and knowledge graph question answering.
    WANG Lin, born in 1981, Ph.D., professional member of CCF. His research interests include big data application and artificial intelligence.
    XU Dawei, born in 1989, Ph.D., professional member of CCF. His research interests include artificial intelligence and natural language pro-cessing.
    JIA Yongzhe, born in 1987, Ph.D., professional member of CCF. His research interests include artificial intelligence and advanced manufa-cturing.
  • Supported by:
    Science and Technology Innovation 2030 “New Generation Artificial Intelligence” Major Project(2020AAA0108504);General Project of National Natural Science Foundation of China(61972275)

有监督实体关系联合抽取方法研究综述

张少伟1, 王鑫1,2,+(), 陈子睿1, 王林3, 徐大为3, 贾勇哲1,3   

  1. 1.天津大学 智能与计算学部,天津 300350
    2.天津市认知计算与应用重点实验室,天津 300350
    3.天津泰凡科技有限公司,天津 300457
  • 通讯作者: + E-mail: wangx@tju.edu.cn
  • 作者简介:张少伟(1996—),男,硕士研究生,主要研究方向为知识表示学习、知识图谱构建。
    王鑫(1981—),男,博士,教授,博士生导师,CCF杰出会员,主要研究方向为知识图谱数据管理、图数据库、大数据分布式处理。
    陈子睿(1998—),男,硕士研究生,CCF学生会员,主要研究方向为知识表示学习、知识图谱问答。
    王林(1981—),男,博士,CCF专业会员,主要研究方向为大数据应用、人工智能。
    徐大为(1989—),男,博士,CCF专业会员,主要研究方向为人工智能、自然语言处理。
    贾勇哲(1987—),男,博士,CCF专业会员,主要研究方向为人工智能、先进制造业。
  • 基金资助:
    科技创新2030“新一代人工智能”重大项目(2020AAA0108504);国家自然科学基金面上项目(61972275)

Abstract:

As a core task of information extraction, joint entity relation extraction can automatically identify entities, the types of entities and the specific relation between entities from unstructured texts or semi-structured texts, which provides basic support for downstream tasks such as knowledge graph construction, intelligent question answering, semantic search, etc. The traditional pipeline method decomposes joint entity relation extraction into two indepen-dent subtasks, named entity recognition and relation extraction. Due to the lack of interaction between the two subtasks, there are some problems such as error propagation in pipeline method. Recently, joint entity relation extraction has become a new trend, since it can further improve the performance of the model by establishing a unified model and making different subtasks interact. The supervised joint entity relation extraction approaches are surveyed in this paper. According to different ways of extracting features, there are two categories of joint entity relation extraction approaches, i.e., joint extraction based on feature engineering and joint extraction based on neural network. Firstly, the joint extraction based on feature engineering is introduced, including integer linear program-ming, card pyramid parsing, probabilistic graphical model and structured prediction, all of these four methods need to adopt complex feature engineering methods. Secondly, the joint extraction based on neural network is presented, which can automatically extract the feature information, gradually becoming the mainstream methods of joint extraction. Parameter sharing methods and joint decoding methods are two kinds of joint extraction methods based on neural network. Thirdly, seven common datasets and evaluation metrics of the supervised joint entity relation extraction are described, and the experimental comparison and analysis of different joint entity relation extraction methods are conducted. Finally, the future research directions of the joint entity relation extraction are put forward.

Key words: joint extraction, feature engineering, neural network

摘要:

实体关系联合抽取作为信息抽取领域的核心任务,能够从非结构化或半结构化的文本中自动识别实体、实体类型以及实体之间特定的关系类型,为知识图谱构建、智能问答和语义搜索等下游任务提供基础支持。传统的流水线方法将实体关系联合抽取分解成命名实体识别和关系抽取两个独立的子任务,由于两个子任务之间缺少交互,流水线方法存在误差传播等问题。近年来,实体关系联合抽取成为新的研究趋势,其可以建立统一的模型使得不同子任务彼此交互,进一步提升模型性能。对有监督实体关系联合抽取方法进行综述,根据抽取特征的不同方式,可将实体关系联合抽取分为基于特征工程的联合抽取和基于神经网络的联合抽取两种类型。首先,介绍基于特征工程的联合抽取,包括整数线性规划、卡片金字塔解析、概率图模型和结构化预测四种方法,这四种方法都需要采用相对复杂的特征工程方法。然后,介绍基于神经网络的联合抽取,这类方法可以自动抽取特征信息,已逐渐成为联合抽取的主流方法,其主要包括共享参数和联合解码两种类型。接着,介绍有监督实体关系联合抽取常用的七个数据集以及评价指标,并对不同的实体关系联合抽取方法进行了实验对比分析。最后,展望实体关系联合抽取的未来研究方向。

关键词: 联合抽取, 特征工程, 神经网络

CLC Number: