计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2565-2574.DOI: 10.3778/j.issn.1673-9418.2103027

• 人工智能 • 上一篇    下一篇

融合记忆网络的细粒度实体分类方法

周祺, 陶皖+(), 孔超, 崔佰婷   

  1. 安徽工程大学 计算机与信息学院,安徽 芜湖 241000
  • 收稿日期:2021-03-26 修回日期:2021-05-14 出版日期:2022-11-01 发布日期:2021-05-18
  • 通讯作者: + E-mail: taowan@ahpu.edu.cn
  • 作者简介:周祺(1997—),女,安徽宿州人,硕士研究生,主要研究方向为云计算、大数据处理。
    陶皖(1972—),女,安徽芜湖人,硕士,教授,主要研究方向为大数据、数据分析。
    孔超(1986—),男,山东人,博士,副教授,主要研究方向为网络数据管理、流媒体数据处理、社交网络分析、数据挖掘。
    崔佰婷(1999—),女,安徽宿州人,主要研究方向为图嵌入、图挖掘。
  • 基金资助:
    国家自然科学基金青年基金项目(61902001);安徽省教育厅高校自然科学重点项目(KJ2019A0158);安徽省教育厅高校自然科学重点项目(KJ2019ZD15);国家级大学生创新创业项目(202010363098);国家级大学生创新创业项目(201910363076)

Fine-Grained Entity Classification Method Fused with Memory Network

ZHOU Qi, TAO Wan+(), KONG Chao, CUI Baiting   

  1. School of Computer and Information, Anhui Polytechnic University, Wuhu, Anhui 241000, China
  • Received:2021-03-26 Revised:2021-05-14 Online:2022-11-01 Published:2021-05-18
  • About author:ZHOU Qi, born in 1997, M.S. candidate. Her research interests include cloud computing and big data processing.
    TAO Wan, born in 1972, M.S., professor. Her research interests include big data and data analysis.
    KONG Chao, born in 1986, Ph.D., associate professor. His research interests include web data management, streaming data processing, social network analysis and data mining.
    CUI Baiting, born in 1999. Her research interests include graph embedding and graph mining.
  • Supported by:
    National Natural Science Foundation for Youth of China(61902001);Key Natural Science Project of Education Department of Anhui Province(KJ2019A0158);Key Natural Science Project of Education Department of Anhui Province(KJ2019ZD15);National Innovation and Entrepreneurship Program for College Students(202010363098);National Innovation and Entrepreneurship Program for College Students(201910363076)

摘要:

细粒度实体分类是在给定实体指称后要求为其分配细粒度类型标签的任务。大多数细粒度实体分类采用远程监督的方法,为实体指称分配知识库中实体所对应的全部类型标签,这会引入无关或具体的噪声标签。在远程监督中,将分配与指称上下文无关的类型标签归为无关噪声标签,分配细粒度标签导致在上下文中实体含义不准确的类型标签归为具体噪声标签。为减轻噪声影响,以往采用人工标注、启发式规则剪枝等方法,但存在效率低、缩减训练集规模导致分类模型整体性能变差等问题。通过引入记忆网络,分类模型能深入学习实体指称上下文与类型标签之间的关联性,增强对相似的指称上下文所对应类型标签的记忆表示,有效减轻无关噪声标签的影响。与此同时,利用变形的层次损失函数有效学习类型标签之间的层次关系,从而缓解具体噪声标签的负面影响。此外,结合L2正则化函数防止训练模型对噪声标签的过拟合。在公开数据集上的实验结果表明,提出的方法能够有效缓解无关噪声标签和具体噪声标签对分类模型的消极影响,在准确率、Macro F1值、Micro F1值上表现均优于以往处理标签噪声的方法。

关键词: 细粒度实体分类, 噪声处理, 记忆网络, 类型标签

Abstract:

Fine-grained entity classification is a task that requires a fine-grained type label to be assigned to a given entity mention. Most of the existing fine-grained entity classification uses distant supervision method. All of the type labels corresponding to the entities in the knowledge base are assigned to the entity mention, which will introduce irrelevant or specific noise labels. In distant supervision, type labels that are not related to the entity mention context are classified as out-of-context noise labels, and type labels whose assignment of fine-grained labels leads to inacc-urate entity meaning in the context are classified as overly-specific noise labels. In order to reduce the impact of noise, manual labeling and heuristic pruning methods have been used in the past, but there are some problems such as low efficiency and reducing the size of the training set, which leads to the deterioration of the overall performance of the classification model. By introducing the memory network, the classification model can deeply learn the correlation between the entity mention context and the type label, enhance the memory representation of the type label corresponding to the similar entity mention context, and effectively reduce the influence of out-of-context noise labels. At the same time, transformative hierarchical loss function is used to effectively learn the hierarchical relationship between type labels, so as to alleviate the negative impact of overly-specific noise labels. In addition, using the L2 regularization function can prevent the model from overfitting noise labels. Experimental results on public datasets show that the proposed method can effectively alleviate the negative effects of out-of-context noise labels and overly-specific noise labels on the classification model, and its performance in accuracy, Macro F1 value and Micro F1 value is superior to previous methods for processing noise labels.

Key words: fine-grained entity classification, noise processing, memory network, type label

中图分类号: