计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (9): 1954-1968.DOI: 10.3778/j.issn.1673-9418.2112109

• 综述·探索 • 上一篇    下一篇

命名实体识别方法研究综述

李冬梅1,2, 罗斯斯1,2, 张小平3,+(), 许福1,2   

  1. 1.北京林业大学 信息学院,北京 100083
    2.国家林业和草原局林业智能信息处理工程技术研究中心,北京 100083
    3.中国中医科学院 中医药信息研究所,北京 100700
  • 收稿日期:2021-12-29 修回日期:2022-04-29 出版日期:2022-09-01 发布日期:2022-09-15
  • 通讯作者: + E-mail: xiao_ping_zhang@139.com
  • 作者简介:李冬梅(1972—),女,博士,教授,主要研究方向为自然语言处理、知识图谱。
    罗斯斯(1993—),女,硕士研究生,主要研究方向为自然语言处理、知识图谱。
    张小平(1969—),女,博士,正高级工程师,主要研究方向为数据挖掘、人工智能。
    许福(1979—),男,博士,教授,主要研究方向为遥感信息处理、智慧林业、智慧园林。
  • 基金资助:
    中央级公益性科研院所基本科研业务费专项资金(ZZ140319-W);国家自然科学基金(61772078)

Review on Named Entity Recognition

LI Dongmei1,2, LUO Sisi1,2, ZHANG Xiaoping3,+(), XU Fu1,2   

  1. 1. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
    2. Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grass-land Administration, Beijing 100083, China
    3. Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
  • Received:2021-12-29 Revised:2022-04-29 Online:2022-09-01 Published:2022-09-15
  • About author:LI Dongmei, born in 1972, Ph.D., professor. Her research interests include natural language processing and knowledge graph.
    LUO Sisi, born in 1993, M.S. candidate. Her re-search interests include natural language pro-cessing and knowledge graph.
    ZHANG Xiaoping, born in 1969, Ph.D., profes-sorate senior engineer. Her research interests include data mining and artificial intelligence.
    XU Fu, born in 1979, Ph.D., professor. His re-search interests include remote sensing informa-tion processing, smart forestry and smart garden.
  • Supported by:
    Fundamental Research Funds for the Central Public Welfare Research Institutes(ZZ140319-W);National Natural Science Foundation of China(61772078)

摘要:

在自然语言处理领域,命名实体识别是信息抽取的第一个关键环节。命名实体识别任务旨在从大量非结构化的文本中识别出命名实体并将其分类为预定义的类型,为关系抽取、文本摘要和机器翻译等自然语言处理任务提供基础支持。首先概述了命名实体识别的定义、研究难点和中文命名实体识别任务的特殊性,总结了命名实体识别任务中常用的中英文公共数据集和评估标准。然后根据命名实体识别的发展历程调研了现有的命名实体识别方法,主要为早期基于规则和词典的命名实体识别方法、基于统计机器学习的命名实体识别方法和基于深度学习的命名实体识别方法。归纳总结了每一种命名实体识别方法的关键思路、优缺点和具有代表性的模型,同时对各阶段的中文命名实体识别方法进行了总结。特别对最新的基于Transformer和基于提示学习的命名实体识别方法进行了综述,这两种细分类的方法是基于深度学习的命名实体识别方法中最先进的方法。最后总结了命名实体识别研究面临的挑战,并展望了未来的研究方向。

关键词: 自然语言处理, 命名实体识别, 机器学习, 深度学习, 关系抽取

Abstract:

In the field of natural language processing, named entity recognition is the first key step of information extraction. Named entity recognition task aims to recognize named entities from a large number of unstructured texts and classify them into predefined types. Named entity recognition provides basic support for many natural language processing tasks such as relationship extraction, text summarization, machine translation, etc. This paper first introduces the definition of named entity recognition, research difficulties, particularity of Chinese named entity recognition, and summarizes the common Chinese and English public datasets and evaluation criteria in named entity recognition tasks. Then, according to the development history of named entity recognition, the existing named entity recognition methods are investigated, which are the early named entity recognition methods based on rules and dictionaries, the named entity recognition methods based on statistic and machine learning, and the named entity recognition methods based on deep learning. This paper summarizes the key ideas, advantages and disadvan-tages and representative models of each named entity recognition method, and summarizes the Chinese named entity recognition methods in each stage. In particular, the latest named entity recognition based on Transformer and based on prompt learning are reviewed, which are state-of-the-art in deep learning-based named entity recognition methods. Finally, the challenges and future research trends of named entity recognition are discussed.

Key words: natural language processing, named entity recognition, machine learning, deep learning, relation extraction

中图分类号: