多尺寸注意力的命名实体识别方法

doi:10.3778/j.issn.1673-9418.2210078

摘要/Abstract

摘要： 命名实体识别（NER）任务的准确性将促进自然语言领域中诸多下游任务的研究。由于文本中存在大量嵌套语义，导致命名实体识别困难，成为自然语言处理中的难点。以往研究提取特征尺度单一，边界信息利用不够充分，忽略了不同尺度下的许多细节信息，从而造成实体识别错误或遗漏的情况。针对上述问题，提出一种多尺度注意力的命名实体识别方法（MSA-NER）。首先，利用BERT模型得到包含上下文信息的表示向量，并通过BiLSTM网络加强文本的上下文表示。其次，将表示向量进行枚举拼接形成跨度信息矩阵，并融合方向信息获得更丰富的交互信息。然后，利用多头注意力构建多个子空间，通过二维卷积在每个子空间下可选地聚合不同尺度的文本信息，在每个注意力层同时进行多尺度的特征融合。最后，将融合的矩阵进行跨度分类以识别命名实体。实验表明，该方法在GENIA和ACE2005英文数据集上[F1]分别达到81.7%和86.8%，与现有主流模型相比有更好的识别效果。

关键词: 命名实体识别（NER）, 嵌套语义, 多尺度注意力, 卷积神经网络, 子空间

Abstract: The accuracy of named entity recognition (NER) task will promote the research of multiple downstream tasks in natural language field. Due to a large number of nested semantics in text, named entities are recognized difficultly. Recognizing nested semantics becomes a difficulty in natural language processing. Previous studies have single scale of extracting feature and under-utilization of the boundary information. They ignore many details under different scales and then lead to the situation of entity recognition error or omission. Aiming at the above problems, a multi-scale attention method for named entity recognition (MSA-NER) is proposed. Firstly, the BERT model is used to obtain representation vector containing context information, and then the BiLSTM network is used to strengthen the context representation of text. Secondly, the representation vectors are enumerated and concatenated to form span information matrix. The direction information is fused to obtain richer interactive information. Thirdly, multi-head attention is used to construct multiple subspaces. Two-dimensional convolution is used to optionally aggregate text information at different scales in each subspace, so as to implement multi-scale feature fusion in each attention layer. Finally, the fused matrix is used for span classification to identify named entities. Experimental results show that the [F1] score of the proposed method reaches 81.7% and 86.8% on GENIA and ACE2005 English datasets, respectively. The proposed method demonstrates better recognition performance compared with existing mainstream models.

Key words: named entity recognition (NER), nested semantics, multi-scale attention, convolutional neural network, subspace

唐瑞雪, 秦永彬, 陈艳平. 多尺寸注意力的命名实体识别方法[J]. 计算机科学与探索, 2024, 18(2): 506-515.

TANG Ruixue, QIN Yongbin, CHEN Yanping. Named Entity Recognition Based on Multi-scale Attention[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 506-515.

参考文献

[1] NADEAU D, SEKINE S. A survey of named entity recognition and classification[J]. Lingvisticae Investigationes, 2007, 30(1): 3-26.
[2] YIH S W, CHANG M W, HE X, et al. Semantic parsing via staged query graph generation: question answering with knowledge base[C]//Proceedings of the Joint Conference of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, Jul 26-31, 2015. Stroudsburg: ACL, 2015: 1321-1331.
[3] WANG X, XU Y, HE X, et al. Reinforced negative sampling over knowledge graph for recommendation[C]//Proceedings of the 2020 Web Conference, Taipei, China, Apr 20-24, 2020. New York: ACM, 2020: 99-109.
[4] GUPTA M, BENDERSKY M. Information retrieval with verbose queries[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Aug 9-13, 2015. New York: ACM, 2015: 1121-1124.
[5] 何儒汉, 唐娇, 史爱武, 等. 基于实体消岐和多粒度注意力的知识库问答[J]. 计算机工程与设计, 2022, 43(2): 560-566.
HE R H, TANG J, SHI A W, et al. Knowledge base question answering based on entity disambiguation and multiple granularity attention[J]. Computer Engineering and Design, 2022, 43(2): 560-566.
[6] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Jun 12-17, 2016: 260-270.
[7] LI X, YAN H, QIU X, et al. FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Mee-ting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 6836-6842.
[8] MCCALLUM A, LI W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]//Proceedings of the 7th Conference on Natural Language Learning, Edmonton, May 31-Jun 1, 2003: 188-191.
[9] LYU C, CHEN B, REN Y, et al. Long short-term memory RNN for biomedical named entity recognition[J]. BMC Bioinformatics, 2017, 18(1): 1-11.
[10] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv:1508.01991, 2015.
[11] JU M, MIWA M, ANANIADOU S. A neural layered model for nested named entity recognition[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Jun 1-6, 2018: 1446-1459.
[12] SHIBUYA T, HOVY E. Nested named entity recognition via second-best sequence learning and decoding[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 605-620.
[13] WANG J, SHOU L, CHEN K, et al. Pyramid: a layered model for nested named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 5918-5928.
[14] LU W, ROTH D. Joint mention extraction and classification with mention hypergraphs[C]//Proceedings of the 2015 Con-ference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015: 857-867.
[15] MUIS A O, LU W. Labeling gaps between words: recognizing overlapping mentions with mention separators[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 9-11, 2017: 2608-2618.
[16] WANG B, LU W. Neural segmental hypergraphs for overlapping mention recognition[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018: 204-214.
[17] KATIYAR A, CARDIE C. Nested named entity recognition revisited[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Jun 1-6, 2018. Stroudsburg: ACL, 2018: 861-871.
[18] XU M, JIANG H, WATCHARAWITTAYAKUL S. A local detection approach for named entity recognition and mention detection[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 1237-1247.
[19] LUAN Y, WADDEN D, HE L, et al. A general framework for information extraction using dynamic span graphs[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 3036-3046.
[20] XIA C, ZHANG C, YANG T, et al. Multi-grained named entity recognition[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 1430-1440.
[21] YU J, BOHNET B, POESIO M. Named entity recognition as dependency parsing[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 6470-6476.
[22] LI X, FENG J, MENG Y, et al. A unified MRC framework for named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 5849-5859.
[23] YAN H, GUI T, DAI J, et al. A unified generative framework for various NER subtasks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 5808-5822.
[24] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[25] DE VRIES H, STRUB F, MARY J, et al. Modulating early visual processing by language[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 6594-6604.
[26] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[27] CHEN H, LIN Z, DING G, et al. GRN: gated relation network to enhance convolutional neural network for named entity recognition[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 6236-6243.