Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (11): 2848-2871.DOI: 10.3778/j.issn.1673-9418.2401033

• Frontiers·Surveys • Previous Articles     Next Articles

Review of Text-Oriented Entity Relation Extraction Research

REN Anqi, LIU Lin, WANG Hailong, LIU Jing   

  1. 1. School of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
    2. Computer Science Joint Innovation Laboratory, Inner Mongolia Normal University, Hohhot 010022, China
    3. Library, Inner Mongolia University, Hohhot 010021, China
  • Online:2024-11-01 Published:2024-10-31

面向文本实体关系抽取研究综述

任安琪,柳林,王海龙,刘静   

  1. 1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
    2. 内蒙古师范大学 计算机科学联合创新实验室,呼和浩特 010022
    3. 内蒙古大学 图书馆,呼和浩特 010021

Abstract: Information extraction is the foundation of knowledge graph construction, and relation extraction, as a key process and core step of information extraction, aims to locate entities from text data and recognize semantic links between entities. Therefore, improving the efficiency of relation extraction can effectively improve the quality of information extraction, which affects the construction of knowledge graph and subsequent downstream tasks. Relation extraction can be categorized into sentence-level relation extraction and document-level relation extraction according to the length of the extracted text. The two levels of extraction methods have their own advantages and disadvantages in different application scenarios: sentence-level relation extraction is suitable for application scenarios with smaller datasets, while document-level relation extraction is suitable for scenarios such as news event analysis, long reports or articles with relational mining. Unlike the existing relation extraction, this paper first introduces the basic concept of relation extraction and the development history of the field in recent years, lists the datasets used in the two levels of relation extraction, and gives an overview of the characteristics of the datasets. Then, this paper elaborates on the sentence-level relation extraction and the document-level relation extraction respectively, summarizes the advantages and disadvantages of different levels of relation extraction, and analyses the performance and limitations of the representative models in each method. Finally, this paper summarizes the problems in the current research field and looks forward to future development of relation extraction.

Key words: information extraction, entity relation extraction, sentence-level relation extraction, document-level relation extraction, knowledge graph construction

摘要: 信息抽取是知识图谱构建的基础,关系抽取作为信息抽取的关键流程和核心步骤,旨在从文本数据中定位实体并识别实体间的语义联系。因此提高关系抽取的效率可以有效提升信息抽取的质量,进而影响到知识图谱的构建以及后续的下游任务。关系抽取按照抽取文本长度可以分为句子级关系抽取和文档级关系抽取,两种级别的抽取方法在不同应用场景下各有优缺点。句子级关系抽取适用于较小规模数据集的应用场景,而文档级关系抽取适用于新闻事件分析、长篇报告或文章的关系挖掘等场景。不同于已有的关系抽取,介绍了关系抽取的基本概念以及领域内近年来的发展历程,罗列了两种级别关系抽取所采用的数据集,对数据集的特点进行概述;分别对句子级关系抽取和文档级关系抽取进行了阐述,介绍了不同级别关系抽取的优缺点,并分析了各类方法中代表模型的性能以及局限性;总结了当前研究领域中存在的问题并对关系抽取发展前景进行了展望。

关键词: 信息抽取, 实体关系抽取, 句子级关系抽取, 文档级关系抽取, 知识图谱构建