Research on Knowledge Injection Method for Large Language Model Oriented to Process Specification Texts

doi:10.3778/j.issn.1673-9418.2406067

Abstract

Abstract: The application of large language models in process specifications is an effective approach to addressing the issue of inaccurate process knowledge queries. At present, the domain model construction methods through domain knowledge graph embedding or fine-tuning with instruction data are not effective. The difficulty lies in the fact that the process knowledge in the process specifications involves relationships between multiple process elements, which is highly complex. The data are sparse because the standards are only used through citation. The high complexity of process knowledge and sparse data limit the model’s ability to learn process domain concepts, the relationships between concepts and attributes, the relationships between concepts, the relationships between multiple concepts, and reference-based knowledge. To address this difficulty, this paper proposes a large language model knowledge injection method for process specification texts. According to the characteristics of process specification data, this paper designs knowledge injection data including auxiliary sentence identification task, concept-chapter generation task, chapter continuation task and chapter-summary generation task. The model is fine-tuned through supervised learning by combining question-answer pair data to inject domain concepts, attributes, relationships between multiple concepts, and reference knowledge into the model. Experimental results show that the model trained with knowledge injection data and question-answer pair data improves ACC (accuracy) by 7.3 percentage points, ROUGE-L by 7.4 percentage points, and BLEU-4 by 6.2 percentage points compared with the model trained only with question-answer pair data, indicating the effectiveness of the proposed knowledge injection method.

Key words: process specification, large language model, knowledge injection, supervised fine-tuning

摘要： 使用大语言模型进行工艺规范的应用是解决工艺知识查询不准确的有效途径。现阶段通过领域知识图谱嵌入或指令数据微调的领域模型构建方法效果不佳，难点在于工艺规范中工艺知识涉及多种工艺要素间关系，复杂度较高。各规范间仅通过引文方式使用导致数据稀疏。工艺知识复杂度高及数据稀疏导致模型对工艺领域概念、概念与属性间关系、概念与概念间关系、多概念间关系及参考依据知识的学习受限。针对该难点，提出一种面向工艺规范文本的大语言模型知识注入方法。根据工艺规范数据特点设计了包含辅助句判别任务、概念-篇章生成任务、篇章续写任务及篇章-摘要生成任务的知识注入数据，结合问答对数据对模型进行有监督微调，为模型注入领域概念、属性、多概念间关系及参考依据知识。实验结果表明，结合知识注入数据和问答对数据训练的模型对比只使用问答对数据训练的模型ACC（准确率）提升7.3个百分点，ROUGE-L提升7.4个百分点，BLEU-4提升6.2个百分点，表明提出的知识注入方法的有效性。

关键词: 工艺规范, 大语言模型, 知识注入, 有监督微调

JI Guiyang, WANG Peiyan, YU Zhuo. Research on Knowledge Injection Method for Large Language Model Oriented to Process Specification Texts[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2361-2369.

纪贵阳, 王裴岩, 余卓. 面向工艺规范文本的大语言模型知识注入方法研究[J]. 计算机科学与探索, 2024, 18(9): 2361-2369.

References

[1] 全国科学技术名词审定委员会. 机械工程名词第二分册[M]. 2版. 北京: 科学出版社, 2021.
China National Committee for Terminology in Science and Technology. Mechanical engineering terminology volume II[M]. 2nd ed. Beijing: China Science Publishing & Media Ltd., 2021.
[2] 邢磊, 张驰, 张柯尧, 等. 浅谈航空装备管理非结构化数据治理实施策略[J]. 航空标准化与质量, 2022(2): 26-31.
XING L, ZHANG C, ZHANG K Y, et al. Implementation strategy of unstructured data governance in aerospace equip-ment management: a discussion[J]. Aeronautic Standardization & Quality, 2022(2): 26-31.
[3] THIRUNAVUKARASU A J, TING D S J, ELANGOVAN K, et al. Large language models in medicine[J]. Nature Medicine, 2023, 29(8): 1930-1940.
[4] ZHANG X, YANG Q. Xuanyuan 2.0: a large Chinese financial chat model with hundreds of billions parameters[C]//Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. New York: ACM, 2023: 4435-4439.
[5] 丁忠军, 杨增辉, 张晋军, 等. 基于岗位知识管理的工艺规范体系建设[J]. 航天工业管理, 2016(5): 58-61.
DING Z J, YANG Z H, ZHANG J J, et al. Construction of process specification system based on job knowledge manage-ment[J]. Aerospace Industry Management, 2016(5): 58-61.
[6] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2790-2799.
[7] LI X L, LIANG P. Prefix-Tuning: optimizing continuous prompts for generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 4582-4597.
[8] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 3045-3059.
[9] LIU X, JI K, FU Y, et al. P-Tuning: prompt tuning can be comparable to fine-tuning across scales and tasks[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 61-68.
[10] HU E J, WALLIS P, ALLEN-ZHU Z, et al. LoRA: low-rank adaptation of large language models[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022.
[11] ZAKEN E B, GOLDBERG Y, RAVFOGEL S. BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Strouds-burg: ACL, 2022: 1-9.
[12] SU Y, HAN X, ZHANG Z, et al. CokeBERT: contextual knowledge selection and embedding towards enhanced pre-trained language models[J]. AI Open, 2021, 2: 127-134.
[13] WANG X, GAO T, ZHU Z, et al. KEPLER: a unified model for knowledge embedding and pre-trained language representation[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 176-194.
[14] LIU W, ZHOU P, ZHAO Z, et al. K-BERT: enabling language representation with knowledge graph[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 2901-2908.
[15] FU P, ZHANG Y, WANG H, et al. Revisiting the knowledge injection frameworks[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 10983-10997.
[16] JIANG Z, ZHONG L, SUN M, et al. Efficient knowledge infusion via KG-LLM alignment[EB/OL]. [2024-07-03]. https://arxiv.org/abs/2406.03746.
[17] GUO T, YANG Q, WANG C, et al. Knowledgenavigator: lever-aging large language models for enhanced reasoning over knowledge graph[EB/OL]. [2024-04-14]. https://arxiv.org/abs/2312.15880.
[18] WU H, ZHANG Y, HAN Z, et al. Quartet Logic: a four-step rea-soning (QLFR) framework for advancing short text classification[EB/OL]. [2024-04-14]. https://arxiv.org/abs/2401.03158.
[19] YE D, LIN Y, LI P, et al. A simple but effective pluggable entity lookup table for pre-trained language models[C]//Proceedings of the 60th Annual Meeting of the Association for Com-putational Linguistics. Stroudsburg: ACL, 2022: 523-529.
[20] ZHANG Z, ZENG Z, LIN Y, et al. Plug-and-play knowledge injection for pre-trained language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 10641-10658.
[21] LEWIS P, PEREZ E, PIKTUS A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 9459-9474.
[22] JIANG Z, XU F F, GAO L, et al. Active retrieval augmented generation[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 7969-7992.
[23] WANG Y, LI P, SUN M, et al. Self-knowledge guided retrieval augmentation for large language models[C]//Procee-dings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 10303-10315.
[24] JEONG S, BAEK J, CHO S, et al. Adaptive-RAG: learning to adapt retrieval-augmented large language models through question complexity[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2024: 7029-7043.
[25] XU S, PANG L, YU M, et al. Unsupervised information refinement training of large language models for retrieval-augmented generation[EB/OL]. [2024-04-14]. https://arxiv.org/abs/2402.18150.
[26] LIU J, HUANG X, CHEN Z, et al. DRAK: unlocking molecular insights with domain-specific retrieval-augmented knowledge in LLMs[EB/OL]. [2024-07-03]. https://arxiv.org/abs/ 2406.18535.
[27] WU C, LIN W, ZHANG X, et al. PMC-LLaMA: toward building open-source language models for medicine[EB/OL]. [2024-04-14]. https://arxiv.org/abs/ 2304.14454.
[28] SHI W, MIN S, LOMELI M, et al. In-context pretraining: language modeling beyond document boundaries[C]//Proceedings of the 12th International Conference on Learning Representations, Vienna, May 7-11, 2024.
[29] WANG J, WANG C, TAN C, et al. Knowledgeable in-context tuning: exploring and exploiting factual knowledge for in-context learning[C]//Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2024: 3261-3280.
[30] DERNBACH S, AGARWAL K, ZUNIGA A, et al. GLaM: fine-tuning large language models for domain knowledge graph alignment via neighborhood partitioning and generative subgraph encoding[C]//Proceedings of the AAAI 2024 Spring Symposium Series, Stanford, Mar 25-27, 2024. Menlo Park: AAAI, 2024: 82-89.
[31] CHENG D, HUANG S, WEI F. Adapting large language models via reading comprehension[C]//Proceedings of the 12th International Conference on Learning Representations, Vienna, May 7-11, 2024.
[32] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting of the Association for Com-putational Linguistics. Stroudsburg: ACL, 2002: 311-318.
[33] LIN C Y. Rouge: a package for automatic evaluation of sum-maries[C]//Proceedings of the 2004 Workshop on Text Summarization Branches Out. Stroudsburg: ACL, 2004: 74-81.