k-best维特比解耦合知识蒸馏的命名实体识别模型

doi:10.3778/j.issn.1673-9418.2211052

摘要/Abstract

摘要： 为提升命名实体识别（NER）模型的性能，可采用知识蒸馏方法，但是传统知识蒸馏损失函数因内部存在的耦合关系会导致蒸馏效果较差。为了解除耦合关系，有效提升输出层特征知识蒸馏的效果，提出一种结合k-best维特比解码的解耦合知识蒸馏方法（kvDKD），该方法利用k-best维特比算法提高计算效率，能够有效提升模型性能。另外，基于深度学习的命名实体识别在数据增强时易引入噪声，因此提出了融合数据筛选和实体再平衡算法的数据增强方法，旨在减少因原数据集引入噪声和增强数据错误标注的问题，提高数据集质量，减少过度拟合。最后在上述方法的基础上，提出了一种新的命名实体识别模型NER-kvDKD。在MSRA、Resume、Weibo、CLUENER和CoNLL-2003数据集上的对比实验结果表明，该方法能够提高模型的泛化能力，同时也有效提高了学生模型性能。

关键词: 命名实体识别（NER）, 知识蒸馏, k-best维特比解码, 数据增强

Abstract: Knowledge distillation is a general approach to improve the performance of the named entity recognition (NER) models. However, the classical knowledge distillation loss functions are coupled, which leads to poor logit distillation. In order to decouple and effectively improve the performance of logit distillation, this paper proposes an approach, k-best Viterbi decoupling knowledge distillation (kvDKD), which combines k-best Viterbi decoding to improve the computational efficiency, effectively improving the model performance. Additionally, the NER based on deep learning is easy to introduce noise in data augmentation. Therefore, a data augmentation method combining data filtering and entity rebalancing algorithm is proposed, aiming to reduce noise introduced by the original dataset and to enhance the problem of mislabeled data, which can improve the quality of data and reduce overfitting. Based on the above method, a novel named entity recognition model NER-kvDKD (named entity recognition model based on k-best Viterbi decoupling knowledge distillation) is proposed. The comparative experimental results on the datasets of MSRA, Resume, Weibo, CLUENER and CoNLL-2003 show that the proposed method can improve the generalization ability of the model and also effectively improves the student model performance.

Key words: named entity recognition (NER), knowledge distillation, k-best Viterbi decoding, data augmentation

赵红磊, 唐焕玲, 张玉, 孙雪源, 鲁明羽. k-best维特比解耦合知识蒸馏的命名实体识别模型[J]. 计算机科学与探索, 2024, 18(3): 780-794.

ZHAO Honglei, TANG Huanling, ZHANG Yu, SUN Xueyuan, LU Mingyu. Named Entity Recognition Model Based on k-best Viterbi Decoupling Knowledge Distillation[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 780-794.

参考文献

[1] DING B, LIU L, BING L, et al. DAGA: data augmentation with a generation approach for low-resource tagging tasks[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 6045-6057.
[2] ZHAO B, CUI Q, SONG R, et al. Decoupled knowledge distillation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 19-24, 2022. Piscataway: IEEE, 2022: 11953-11962.
[3] SHANG J, LIU J, JIANG M, et al. Automated phrase mining from massive text corpora[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(10): 1825-1837.
[4] DAVID Y, GRACE N, RICHARD W. Inducing multilingual text analysis tools via robust projection across aligned corpora[C]//Proceedings of the 1st International Conference on Human Language Technology Research, San Diego, Mar 18-21, 2001: 1-8.
[5] CHEN Y C, GAN Z, CHENG Y, et al. Distilling knowledge learned in BERT for text generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 7893-7905.
[6] WU Q, LIN Z, KARLSSON B F, et al. Single-/multi-source cross-lingual NER via teacher-student learning on unlabeled data in target language[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6505-6514.
[7] ZHAO S. Named entity recognition in biomedical texts using an HMM model[C]//Proceedings of the 2004 Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Geneva, Aug 28-29, 2004: 84-87.
[8] HABIB M S, KALITA J. Scalable biomedical named entity recognition: investigation of a database supported SVM approach[J]. International Journal of Bioinformatics Research & Applications, 2010, 6(2): 191-208.
[9] MCCALLUM A, LI W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]//Proceedings of the 7th Conference on Natural Language Learning, Edmonton, May 31-Jun 1, 2003. Stroudsburg: ACL, 2003: 188-191.
[10] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv:1508.01991, 2015.
[11] STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 7-11, 2017. Stroudsburg: ACL, 2017: 2670-2680.
[12] 王月, 王孟轩, 张胜, 等. 基于BERT的警情文本命名实体识别[J]. 计算机应用, 2020, 40(2): 535-540.
WANG Y, WANG M X, ZHANG S, et al. Alarm text named entity recognition based on BERT[J]. Journal of Computer Applications, 2020, 40(2): 535-540.
[13] DONAHUE D, RUMSHISKY A. Adversarial text generation without reinforcement learning[J]. arXiv:1810.06640, 2018.
[14] ZHANG Y, YANG J. Chinese NER using lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 1554-1564.
[15] GUI T, MA R, ZHANG Q, et al. CNN-based Chinese NER with lexicon rethinking[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, Aug 10-16, 2019: 4982-4988.
[16] GUI T, ZOU Y, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China,Nov 3-7, 2019. Stroudsburg: ACL, 2019: 1040-1050.
[17] MENGGE X, BOWEN Y, TINGWEN L, et al. Porous lattice-based transformer encoder for Chinese NER[J]. arXiv:1911.02733, 2019.
[18] LI X, YAN H, QIU X, et al. FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 6836-6842.
[19] LI Z, HU C, GUO X, et al. An unsupervised multiple-task and multiple-teacher model for cross-lingual named entity recognition[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 170-179.
[20] LIANG G, LEUNG C W K. Improving model generalization: a Chinese named entity recognition case study[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Aug 1-7, 2021. Stroudsburg: ACL, 2021: 992-997.
[21] WEI T, QI J, HE S, et al. Masked conditional random fields for sequence labeling[J]. arXiv:2103.10682, 2021.
[22] ZHOU X, ZHANG X, TAO C, et al. Multi-grained knowledge distillation for named entity recognition[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico, Jun 6-11, 2021. Stroudsburg: ACL, 2021: 5704-5716.
[23] KENDALL A, GAL Y, CIPOLLA R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Wellington, Dec 14-16, 2018. Washington: IEEE Computer Society, 2018: 7482-7491.
[24] WU S, SONG X, FENG Z, et al. NFLAT: non-flat-lattice transformer for Chinese named entity recognition[J]. arXiv:2205.05832, 2022.
[25] CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
[26] SHEN Y, MA X, TAN Z, et al. Locate and label: a two-stage identifier for nested named entity recognition[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 2782-2794.
[27] SHEN Y, WANG X, TAN Z, et al. Parallel instance query network for named entity recognition[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 947-961.