Few-Shot Named Entity Recognition with Prefix-Tuning

doi:10.3778/j.issn.1673-9418.2307060

Abstract

Abstract: The commonly adopted approach for few-shot named entity recognition (NER) typically involves the use of similarity-based metrics. In order to fully leverage knowledge transfer within the model parameters, this paper proposes a prefix-tuning method for few-shot NER (P-NER). This involves placing the input text’s feature vectors into an embedding module for feature extraction. The vector parameters of prefix prompts are concatenated to the front end of the encoding layer model, with the encoding layer model parameters being fixed. The results obtained from the encoding layer are decoded using a cross-entropy model. For each training sample, two sub-models are sampled, and regularization of the model prediction is achieved by minimizing the relative entropy between the two sub-models. The model’s consistency with actual labels is assessed by validating the output probability and the probability of true labels for each word, ultimately yielding the classification results. Experimental results demonstrate that on the CoNLL2003 dataset, this method achieves an average F1 score of 84.92% for in-domain few-shot entity recognition. In the cross-domain few-shot entity recognition tasks, it outperforms other baseline methods on three datasets: MIT Movie, MIT Restaurant and ATIS. Thus, this method significantly enhances the effectiveness of few-shot named entity recognition with a mere 2.9% adjustment to the parameters of previous fine-tuning methods.

Key words: named entity recognition (NER), few-shot learning, prompt learning

摘要： 少样本命名实体识别通常使用基于相似性的度量，为了能够充分利用模型参数中的知识转移，提出一种前缀调优的少样本命名实体识别方法（P-NER）。将输入文本的特征向量放入嵌入模块进行特征提取；把前缀提示的向量参数拼接到编码层模型的前端，并将编码层模型参数进行固定；对编码层得到的结果进行交叉熵模型的解码，并对每个训练样本采样两个子模型，通过最小化两个子模型之间相对熵的方式达到对模型预测进行正则化的目的；通过验证输出概率和真实标签概率来衡量模型对每个词的标签预测与实际标签的一致程度并输出分类结果。实验结果表明在CoNLL2003数据集上，该方法的域内少样本实体识别的平均F1得分为84.92%，在跨领域少样本实体识别的MIT Movie、MIT Restaurant和ATIS三个数据集中均领先其他基线方法的结果。因此，该方法可在只需要调节以往微调方法的2.9%参数的情况下，显著提高少样本命名实体识别的效果。

关键词: 命名实体识别（NER）, 少样本学习, 提示学习

LYU Haixiao, LI Yihong, ZHOU Xiaoyi. Few-Shot Named Entity Recognition with Prefix-Tuning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2180-2189.

吕海啸, 李益红, 周晓谊. 前缀调优的少样本命名实体识别[J]. 计算机科学与探索, 2024, 18(8): 2180-2189.

References

[1] NADEAU D, SEKINE S.?A survey of named entity recognition and classification[J]. Lingvisticae Investigationes, 2007, 30(1): 3-26.
[2] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
[3] PENG M, XING X, ZHANG Q, et al. Distantly supervised named entity recognition using positive un-labeled learning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2409-2419.
[4] SHANG J, LIU L, GU X, et al. Learning named entity tagger using domain-specific dictionary[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processings. Stroudsburg: ACL, 2018: 2054-2064.
[5] YANG Y, CHEN W, LI Z, et al. Distantly supervised NER with partial annotation learning and reinforcement learning[C]//Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg: ACL, 2018: 2159-2169.
[6] LIU T, YAO J G, LIN C Y. Towards improving neural named entity recognition with gazetteers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational linguistics. Stroudsburg: ACL, 2019: 5301-5307.
[7] SAFRANCHIK E, LUO S, BACH S. Weakly supervised sequence tagging from noisy rules[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 5570-5578.
[8] JIANG C, ZHAO Y, CHU S, et al. Cold-start and interpretability: turning regular expressions into trainable recurrent neural networks[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 3193-3207.
[9] CUI L, WU Y, LIU J, et al. Template-based named entity recognition using BART[C]//Findings of the Association for Computational Linguistics: Association for Computational Linguistics-International Joint Conference on Natural Language Processing 2021. Stroudsburg: ACL, 2021: 1835-1845.
[10] BAO Z, HUANG R, LI C, et al. Low-resource sequence labeling via unsupervised multilingual contextualized representations[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 2159-2169.
[11] HUANG L, JI H, MAY J. Cross-lingual multi-level adversarial transfer to enhance low-resource name tagging[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 3823-3833.
[12] BARI M S, JOTY S, JWALAPURAM P. Zero-resource crosslingual named entity recognition[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 7415-7423.
[13] WISEMAN S, STRATOS K. Label-agnostic sequence labeling by copying nearest neighbors[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 5363-5369.
[14] YANG Y, KATIYAR A. Simple and effective few-shot named entity recognition with structured nearest neighbor learning[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 6365-6375.
[15] ZIYADI M, SUN Y, GOSWAMI A, et al. Example-based named entity recognition[EB/OL]. [2023-05-13]. https://arxiv.org/abs/2008.10570.
[16] XU M, JIANG H, WATCHARAWITTAYAKUL S. A local detection approach for named entity recognition and mention detection[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1237-1247.
[17] HOU Y, CHE W, LAI Y, et al. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 1381-1393.
[18] HUANG J, LI C, SUBUDHI K, et al. Few-shot named entity recognition: an empirical baseline study[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 10408-10423.
[19] FRITZLER A, LOGACHEVA V, KRETOV M. Few-shot classification in named entity recognition task[C]//Procee-dings of the 34th ACM/SIGAPP Symposium on Applied Computing. New York: ACM, 2019: 993-1000.
[20] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
[21] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1877-1901.
[22] PETRONI F, ROCKT?SCHE T, RIEDEL S, et al. Langu-age models as knowledge bases[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 2463-2473.
[23] WINATA G I, MADOTTO A, LIN Z, et al. Language models are few-shot multilingual learners[C]//Proceedings of the 1st Workshop on Multilingual Representation Learning. Stroudsburg: ACL, 2021: 1-15.
[24] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551.
[25] SUN Y, WANG S, FENG S, et al. ERNIE 3.0: large-scale knowledge enhanced pretraining for language understanding and generation[EB/OL]. [2023-05-13]. https://arxiv.org/abs/2107.02137.
[26] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 1-35.
[27] WANG L, LI R, YAN Y, et al. Instructionner: a multi-task instruction-based generative framework for few-shot NER[EB/OL]. [2023-05-13]. https://arxiv.org/abs/2203.03903.
[28] CHEN X, LI L, DENG S, et al. LightNER: a lightweight tuning paradigm for low-resource NER via pluggable prompting[C]//Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Oct 12-17, 2022: 2374-2387.
[29] LIU X, JI K, FU Y, et al. P-tuning: prompt tuning can be comparable to fine-tuning across scales and tasks[C]//Procee-dings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 61-68.
[30] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 4582-4597.
[31] KHATTAK M U, RASHEED H, MAAZ M, et al. MaPLe： multi-modal prompt learning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Jun 17-24, 2023. Piscataway: IEEE, 2023: 19113-19122.
[32] LIU H, TAM D, MUQEETH M, et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 1950-1965.
[33] LIU Y, OTT M, GOYAL N, et al. RoBERTA: a robustly optimized BERT pretraining approach[EB/OL]. [2023-05-13].https://arxiv.org/abs/1907.11692.
[34] HE P, LIU X, GAO J, et al. DeBERTa: decoding-enhanced BERT with disentangled attention[EB/OL]. [2023-05-13]. https://arxiv.org/abs/2006.03654.
[35] WU L, LI J, WANG Y, et al. R-Drop: regularized dropout for neural networks[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 10890-10905.
[36] HE W, DAI Y, ZHENG Y, et al. GALAXY: a generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection[C]//Proceedings of the 2022 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 10749-10757.
[37] ZHANG H, LI G, LI J, et al. Fine-tuning pre-trained language models effectively by optimizing subnetworks adaptively[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 21442-21454.
[38] SANG E F, DE MEULDER F.?Introduction to the CoNLL-2003 shared task: language-independent named entity recognition[C]//Proceedings of the 7th Conference on Natural Language Learning. Stroudsburg: ACL, 2003: 142-147.
[39] LIU J, PASUPAT P, CYPHERS S, et al. Asgard: a portable architecture for multilingual dialogue systems[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, May 26-31, 2013. Piscataway: IEEE, 2013: 8386-8390.
[40] HAKKANI-TüR D, TüR G, CELIKYILMAZ A, et al. Multi-domain joint semantic frame parsing using bidirectional RNN-LSTM[C]//Proceedings of the 17th Annual Meeting of the International Speech Communication Association, San Francisco, Sep 8-12, 2016: 715-719.
[41] YANG Y, KATIYAR A. Simple and effective few-shot named entity recognition with structured nearest neighbor learning[EB/OL]. [2023-05-13]. https://arxiv.org/abs/2010.02405.