计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (10): 2377-2388.DOI: 10.3778/j.issn.1673-9418.2308070

• 大语言模型与知识图谱专题 • 上一篇    下一篇

大语言模型融合知识图谱的问答系统研究

张鹤译,王鑫,韩立帆,李钊,陈子睿,陈哲   

  1. 1. 天津大学 智能与计算学部,天津 300354
    2. 天津中医药大学 循证医学中心,天津 301617
  • 出版日期:2023-10-01 发布日期:2023-10-01

Research on Question Answering System on Joint of Knowledge Graph and Large Language Models

ZHANG Heyi, WANG Xin, HAN Lifan, LI Zhao, CHEN Zirui, CHEN Zhe   

  1. 1. College of Intelligence and Computing, Tianjin University, Tianjin 300354, China
    2. Evidence-Based Medicine Center, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
  • Online:2023-10-01 Published:2023-10-01

摘要: 大语言模型(large language model,LLM),包括ChatGPT,在理解和响应人类指令方面表现突出,对自然语言问答影响深远。然而,由于缺少针对垂直领域的训练,LLM在垂直领域的表现并不理想。此外,由于对硬件的高要求,训练和部署LLM仍然具有一定困难。为了应对这些挑战,以中医药方剂领域的应用为例,收集领域相关数据并对数据进行预处理,基于LLM和知识图谱设计了一套垂直领域的问答系统。该系统具备以下能力:(1)信息过滤,过滤出垂直领域相关的问题,并输入LLM进行回答;(2)专业问答,基于LLM和自建知识库来生成更具备专业知识的回答,相比专业数据的微调方法,该技术无需重新训练即可部署垂直领域大模型;(3)抽取转化,通过强化LLM的信息抽取能力,利用生成的自然语言回答,从中抽取出结构化知识,并和专业知识图谱匹配以进行专业验证,同时可以将结构化知识转化成易读的自然语言,实现了大模型与知识图谱的深度结合。最后展示了该系统的效果,并通过专家主观评估与选择题客观评估两个实验,从主客观两个角度验证了系统的性能。

关键词: 大语言模型(LLM), 知识图谱, 问答系统, 垂直领域, 中医药方剂

Abstract: The large language model (LLM), including ChatGPT, has shown outstanding performance in understanding and responding to human instructions, and has a profound impact on natural language question answering (Q&A). However, due to the lack of training in the vertical field, the performance of LLM in the vertical field is not ideal. In addition, due to its high hardware requirements, training and deploying LLM remains difficult. In order to address these challenges, this paper takes the application of traditional Chinese medicine formulas as an example, collects the domain related data and preprocesses the data. Based on LLM and knowledge graph, a vertical domain Q&A system is designed. The system has the following capabilities: (1) Information filtering. Filter out vertical domain related questions and input them into LLM to answer. (2) Professional Q&A. Generate answers with more professional knowledge based on LLM and self-built knowledge base. Compared with the fine-tuning method of introducing professional data, using this technology can deploy large vertical domain models without the need for retraining. (3) Extract conversion. By strengthening the information extraction ability of LLM and utilizing its generated natural language responses, structured knowledge is extracted and matched with a professional knowledge graph for professional verification. At the same time, structured knowledge can be transformed into readable natural language, achieving a deep integration of large models and knowledge graphs. Finally, the effect of the system is demonstrated and the performance of the system is verified from both subjective and objective perspectives through two experiments of subjective evaluation of experts and objective evaluation of multiple choice questions.

Key words: large language model (LLM), knowledge graph, Q&A system, vertical field, traditional Chinese medicine