DBpedia Spotlight上的命名实体识别优化

doi:10.3778/j.issn.1673-9418.1607015

计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (7): 1044-1055.DOI: 10.3778/j.issn.1673-9418.1607015

DBpedia Spotlight上的命名实体识别优化

付宇新1,2，王鑫1,2+，冯志勇2,3，徐强1,2

1. 天津大学计算机科学与技术学院，天津 300354
2. 天津市认知计算与应用重点实验室，天津 300354
3. 天津大学软件学院，天津 300354

出版日期:2017-07-01 发布日期:2017-07-07

Named Entity Recognition Optimization on DBpedia Spotlight

FU Yuxin1,2, WANG Xin1,2+, FENG Zhiyong2,3, XU Qiang1,2

1. School of Computer Science and Technology, Tianjin University, Tianjin 300354, China
2. Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300354, China
3. School of Computer Software, Tianjin University, Tianjin 300354, China

Online:2017-07-01 Published:2017-07-07

摘要/Abstract

摘要： 命名实体识别任务能够搭建知识库与自然语言之间的桥梁，为关键字提取、机器翻译、主题检测与跟踪等研究工作提供支撑。通过对目前命名实体识别领域的相关研究进行分析，提出了一套通用的命名实体识别优化方案。首先，设计并实现了利用候选集的增量式扩展方法，降低了对训练集的依赖性；其次，通过点互信息率对实体上下文进行特征选择，大幅度降低了上下文空间，同时提高了标注性能；最后，提出了基于主题向量的二次消歧方法，进一步增强了标注准确率。通过在广泛使用的开源命名实体识别系统DBpedia Spotlight上进行多种比较实验，验证了所提优化方案与已有系统相比具有较优的性能指标。

关键词: 命名实体识别, 链接数据, DBpedia Spotlight

Abstract: The task of named entity recognition can bridge the gap between knowledge bases and nature languages, and support the research work in keyword extraction, machine translation, topic detection and tracking, etc. Based on the analysis of current research in the field of named entity recognition, this paper proposes a general-purpose optimization scheme for named entity recognition. Firstly, this paper designs and implements an incremental extending method, by using a candidate set, which can reduce the dependency on the training set. Secondly, by leveraging the concept of pointwise mutual information ratio, this paper effectively makes feature selection on the contexts of entities, which may reduce the context space significantly and meanwhile improve the performance of annotation results. Finally, this paper presents the secondary disambiguation method based on topic vectors, which can further enhance the precision of annotation. This paper conducts extensive comparison experiments on the widely-used open-source named entity recognition system DBpedia Spotlight. It has been verified that the proposed optimization scheme outperforms the state-of-the-art methods.

Key words: named entity recognition, linked data, DBpedia Spotlight

付宇新，王鑫，冯志勇，徐强. DBpedia Spotlight上的命名实体识别优化[J]. 计算机科学与探索, 2017, 11(7): 1044-1055.

FU Yuxin, WANG Xin, FENG Zhiyong, XU Qiang. Named Entity Recognition Optimization on DBpedia Spotlight[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7): 1044-1055.

[1]	李猛，李艳玲，林民. 命名实体识别的迁移学习研究综述[J]. 计算机科学与探索, 2021, 15(2): 206-218.
[2]	韩鑫鑫，贲可荣，张献. 军用软件测试领域的命名实体识别技术研究[J]. 计算机科学与探索, 2020, 14(5): 740-748.
[3]	李冬梅，檀稳. 植物属性文本的命名实体识别方法研究[J]. 计算机科学与探索, 2019, 13(12): 2085-2093.
[4]	田家源，杨东华，王宏志. 面向互联网资源的医学命名实体识别研究[J]. 计算机科学与探索, 2018, 12(6): 898-907.

DBpedia Spotlight上的命名实体识别优化

Named Entity Recognition Optimization on DBpedia Spotlight

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics