计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (2): 324-341.DOI: 10.3778/j.issn.1673-9418.2208028

• 前沿·综述 • 上一篇    下一篇

中文命名实体识别研究综述

王颖洁,张程烨,白凤波,汪祖民,季长清   

  1. 1. 大连大学 信息工程学院,辽宁 大连 116622
    2. 大连大学 物理科学与技术学院,辽宁 大连 116622
    3. 中国政法大学 证据科学研究院,北京 100088
  • 出版日期:2023-02-01 发布日期:2023-02-01

Review of Chinese Named Entity Recognition Research

WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin, JI Changqing   

  1. 1. College of Information Engineering, Dalian University, Dalian, Liaoning 116622, China
    2. College of Physical Science and Technology, Dalian University, Dalian, Liaoning 116622, China
    3. Institute of Evidence Law and Forensic Science, China University of Political Science and Law, Beijing 100088, China
  • Online:2023-02-01 Published:2023-02-01

摘要: 随着自然语言处理领域相关技术的快速发展,作为自然语言处理的上游任务,提高命名实体识别的准确率对于后续的文本处理任务而言具有重要的意义。然而,中文和英文语系之间存在差异,导致英文的命名实体识别研究成果难以有效地迁移到中文研究中。因此从以下四方面分析了当前中文命名实体识别研究中的关键问题:首先以命名实体识别的发展历程作为主要线索,从各阶段存在的优缺点、常用方法和研究成果等角度进行了综合论述;其次从序列标注、评价指标、中文分词方法及数据集的角度出发,对中文文本预处理方法进行了总结;接着针对中文字词特征融合方法,从字融合和词融合的角度对当前的研究进行了总结,并对当前中文命名实体识别模型的优化方向进行了论述;最后分析了当前中文命名实体识别在各领域的实际应用。对当前中文命名实体识别的研究进行论述,旨在帮助科研工作者更为全面地了解该任务的研究方向和研究意义,从而为新方法和新改进的提出提供一定的参考。

关键词: 命名实体识别, 深度学习, 特征融合, 评估指标

Abstract: With the rapid development of related technologies in the field of natural language processing, as an upstream task of natural language processing, improving the accuracy of named entity recognition is of great significance for subsequent text processing tasks. However, due to the differences between Chinese and English languages, it is difficult to transfer the research results of English named entity recognition into Chinese research effectively. Therefore, the key issues in the current research of Chinese named entity recognition are analyzed from the following four aspects: Firstly, the development of named entity recognition is taken as the main clue, the advantages and disadvantages, common methods and research results of each stage are comprehensively discussed. Secondly, the Chinese text preprocessing methods are summarized from the perspective of sequence annotation, evaluation index, Chinese word segmentation methods and datasets. Then, aiming at the Chinese character and word feature fusion method, the current research is summarized from the perspective of character fusion and word fusion, and the optimization direction of the current Chinese named entity recognition model is discussed. Finally, the practical applications of Chinese named entity recognition in various fields are analyzed. This paper discusses the current research on Chinese named entity recognition, aiming to help researchers understand the research direction and significance of this task more comprehensively, so as to provide a certain reference for proposing new methods and new improvements.

Key words: named entity recognition, deep learning, feature fusion, evaluation metrics