基于大语言模型增强表征对齐的小样本持续关系抽取方法

doi:10.3778/j.issn.1673-9418.2406056

摘要/Abstract

摘要： 关系抽取作为自然语言处理的关键任务，对于深化语言理解、构建知识图谱以及优化信息检索系统具有重要作用。然而，由于新关系不断涌现且缺乏大量标注示例，传统的监督学习方法并不适合实际场景。尽管大语言模型的出现显著提升了许多自然语言处理任务的性能，但仍然无法直接有效地解决小样本持续关系抽取任务的挑战。为了充分利用大语言模型的语义知识来缓解灾难性遗忘与过拟合问题，提出了一种基于大语言模型增强表征对齐的小样本持续关系抽取方法LAFA，通过关系实例改写、语义扩充和关系增强表征等策略，在保持数据量和计算成本较低的同时，有效提升了模型对新关系的适应性和对旧知识的保持能力。在两个关系抽取数据集FewRel、TACRED上进行实验验证，与现有方法相比，LAFA在小样本持续关系抽取任务中展现出较好的效果，尤其在增量阶段取得了最佳的实验结果。通过消融实验进一步揭示了方法中各个模块对整体性能的显著贡献。LAFA的推理效率与开销远远低于现有的基于大语言模型的方法，并且具有很强的扩展性，能够适配多种语言模型。

关键词: 大语言模型（LLM）, 关系抽取, 持续学习, 小样本学习

Abstract: Relation extraction, as a key task in natural language processing, plays a significant role in deepening language understanding, constructing knowledge graphs, and optimizing information retrieval systems. However, traditional supervised learning methods are not well-suited for real-world scenarios due to the continuous emergence of new relations and the lack of large annotated datasets. Although the advent of large language models has significantly improved the performance of many natural language processing tasks, they still cannot effectively address the challenges of few-shot continual relation extraction. To fully leverage the semantic knowledge of large language models to mitigate catastrophic forgetting and overfitting issues, a novel few-shot continual relation extraction method, LAFA (large language model augmentation and feature alignment), is proposed. This method enhances representation alignment through various strategies such as relation instance rewriting, semantic expansion, and enhanced relation representation. It effectively improves the model adaptability to new relations and the retention of old knowledge while maintaining low data and computational costs. Experimental validation on two relation extraction datasets, FewRel and TACRED, demonstrates that LAFA outperforms existing methods in few-shot continual relation extraction tasks, particularly achieving the best results in incremental stages. Ablation experiments further reveal the significant contributions of each module to overall performance. Moreover, the inference efficiency and cost of LAFA are substantially lower than those of existing large language model-based methods, and it boasts strong scalability, being able to adapt to various language models.

Key words: large language model (LLM), relation extraction, continual learning, few-shot learning

李逸飞, 张玲玲, 董宇轩, 王佳欣, 仲宇杰, 魏笔凡. 基于大语言模型增强表征对齐的小样本持续关系抽取方法[J]. 计算机科学与探索, 2024, 18(9): 2326-2336.

LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan. Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2326-2336.

参考文献

[1] WU Y, BAMMAN D, RUSSELL S. Adversarial training for relation extraction[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Stroudsburg: ACL, 2017: 1778-1783.
[2] ZHAO X, YANG M, QU Q, et al. Exploring privileged features for relation extraction with contrastive student-teacher learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(8): 7953-7965.
[3] ZHANG X, LIU T, LI P, et al. Robust neural relation extraction via multi-granularity noises reduction[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 33(9): 3297-3310.
[4] 张西硕, 柳林, 王海龙, 等. 知识图谱中实体关系抽取方法研究[J]. 计算机科学与探索, 2024, 18(3): 574-596.
ZHANG X S, LIU L, WANG H L, et al. Survey of entity relationship extraction methods in knowledge graphs[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 574-596.
[5] 陈子睿, 王鑫, 王林, 等. 开放领域知识图谱问答研究综述[J]. 计算机科学与探索, 2021, 15(10): 1843-1869.
CHEN Z R, WANG X, WANG L, et al. Survey of open-domain knowledge graph question answering[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(10): 1843-1869.
[6] ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report[EB/OL]. [2024-05-19]. https://arxiv.org/abs/2303.08774.
[7] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2024-05-17]. https://arxiv.org/abs/2302.13971.
[8] WAN Z, CHENG F, MAO Z, et al. GPT-RE: in-context learning for relation extraction using large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 2023. Stroudsburg: ACL, 2023: 3534-3547.
[9] HAN X, ZHU H, YU P, et al. FewRel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 2018. Stroudsburg: ACL, 2018: 4803-4809.
[10] ZHANG Y, ZHONG V, CHEN D, et al. Position-aware attention and supervised data improve slot filling[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Stroudsburg: ACL, 2017: 35-45.
[11] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[12] ZENKE F, POOLE B, GANGULI S. Continual learning through synaptic intelligence[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017. New York: ACM, 2017: 3987-3995.
[13] FERNANDO C, BANARSE D, BLUNDELL C, et al. PathNet: evolution channels gradient descent in super neural networks[EB/OL]. [2024-05-17]. https://arxiv.org/abs/1701.08734.
[14] WORTSMAN M, RAMANUJAN V, LIU R, et al. Supermasks in superposition[C]//Advances in Neural Information Processing Systems 33, 2020: 15173-15184.
[15] WANG H, XIONG W, YU M, et al. Sentence embedding alignment for lifelong relation extraction[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minnesota, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 796-806.
[16] CUI L, YANG D, YU J, et al. Refining sample embeddings with relation prototypes to enhance continual relation extraction[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 232-243.
[17] ZHAO K, XU H, YANG J, et al. Consistent representation learning for continual relation extraction[EB/OL].[2024-05-17]. https://arxiv.org/abs/2203.02721.
[18] GAO T, HAN X, LIU Z, et al. Hybrid attention-based prototypical networks for noisy few-shot relation classification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Hawaii, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 6407-6414.
[19] FAN M, BAI Y, SUN M, et al. Large margin prototypical network for few-shot relation classification with fine-grained features[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, Nov 3-7, 2019. New York: ACM, 2019: 2353-2356.
[20] OBAMUYIDE A, VLACHOS A. Model-agnostic meta-learning for relation classification with limited supervision[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 5873-5879.
[21] DONG B, YAO Y, XIE R, et al. Meta-information guided meta-learning for few-shot relation classification[C]//Proceedings of the 28th International Conference on Computational Linguistics, Spain, Dec 8-13, 2020: 1594-1605.
[22] ZHANG K, GUTIéRREZ B J, SU Y. Aligning instruction tasks unlocks large language models as zero-shot relation extractors[C]//Findings of the Association for Computational Linguistics, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 794-812.
[23] WEI X, CUI X, CHENG N, et al. Zero-shot information extraction via chatting with ChatGPT[EB/OL]. [2024-05-17]. https://arxiv.org/abs/2302.10205.
[24] XIE T, LI Q, ZHANG J, et al. Empirical study of zero-shot NER with ChatGPT[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 7935-7956.
[25] LI M, ZHANG R. How far is language model from 100% few-shot named entity recognition in medical domain[EB/OL]. [2024-05-20]. https://arxiv.org/abs/2307.00186.
[26] LU D, RAN S, TETREAULT J, et al. Event extraction as question generation and answering[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 1666-1688.
[27] GAN C, ZHANG Q, MORI T. GIELLM: Japanese general information extraction large language model utilizing mutual reinforcement effect[EB/OL]. [2024-05-20]. https://arxiv.org/abs/2311.06838.
[28] SAINZ O, GARCíA-FERRERO I, AGERRI R, et al. Gollie: annotation guidelines improve zero-shot information-extraction[EB/OL]. [2024-05-17]. https://arxiv.org/abs/2310.03668.
[29] WANG X, ZHOU W, ZU C, et al. InstructUIE: multi-task instruction tuning for unified information extraction[EB/OL]. [2024-05-17]. https://arxiv.org/abs/2304.08085.
[30] QIN C, JOTY S. Continual few-shot relation learning via embedding space regularization and data augmentation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 2776-2789.
[31] HAN X, DAI Y, GAO T, et al. Continual relation learning via episodic memory activation and reconsolidation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 6429-6440.
[32] HU C, YANG D, JIN H, et al. Improving continual relation extraction through prototypical contrastive learning[C]//Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Oct 12-17, 2022: 1885-1895.
[33] MA Y, CAO Y, HONG Y, et al. Large language model is not a good few-shot information extractor, but a good reranker for hard samples![C]//Findings of the Association for Computational Linguistics, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 10572-10601.
[34] WANG X, WANG Z, HU W. Serial contrastive knowledge distillation for continual few-shot relation extraction[C]//Findings of the Association for Computational Linguistics, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 12693-12706.
[35] CHEN X, WU H, SHI X. Consistent prototype learning for few-shot continual relation extraction[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 7409-7422.
[36] DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 320-335.
[37] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2024-05-17]. https://arxiv.org/abs/1810.04805.