计算机科学与探索

• 学术研究 •    下一篇

融合知识图谱和大模型的高校科研管理问答系统设计

王永, 秦嘉俊, 黄有锐, 邓江洲   

  1. 重庆邮电大学电子商务与现代物流重点实验室,重庆 400065

Design of a University Research Management Question Answering System Integrating Knowledge Graph and Large Language Models

WANG Yong, QIN Jiajun, HUANG Yourui, DENG Jiangzhou   

  1. Key Laboratory of Electronic Commerce and Logistics, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

摘要: “十四五”规划鼓励高校参与场景创新,以人工智能技术推动智能化建设。其中,科研管理是高校管理中的重要组成部分,但现有的科研管理系统难以满足用户的个性化需求。为此,本文以高校科研管理向智能化转型为需求导向,将知识图谱、传统模型和大语言模型相结合,共同构建新一代高校科研管理问答系统。首先,采集科研知识用于构建科研知识图谱。然后,利用同时进行意图分类和实体提取的多任务模型进行语义解析。最后,借助解析结果来生成查询语句,并从知识图谱中检索信息回复常规问题。同时,将大语言模型与知识图谱相结合,以辅助处理开放性问题。通过在意图和实体具有关联的数据集上的实验结果表明,本文采用的多任务模型在意图分类和实体识别任务上的F1值分别为0.958和0.937,优于其它对比模型和单任务模型。Cypher生成测试表明以自定义Prompt在激发大语言模型涌现能力方面的成效,利用大语言模型实现文本生成Cypher的准确率达到85.8%,有效处理了基于知识图谱的开放性问题。此外,以知识图谱、传统模型和大语言模型搭建的问答系统其准确性为0.935,很好地满足了智能问答的需求。

关键词: 知识图谱, 多任务模型, 意图分类, 命名实体识别, 大语言模型

Abstract: The “14th Five-Year Plan” encourages universities to participate in scenario innovation and promote intelligent construction with artificial intelligence technology. Among these initiatives, scientific research management is a crucial aspect of university management. However, existing scientific research management systems cannot meet the individual needs of users. To address this issue, this article focuses on transforming university scientific research management towards intelligence as the demand orientation, and combines knowledge graph, traditional model and large language models to jointly build a new generation of university scientific research management question answering system. First, it collected scientific research knowledge to build a scientific research knowledge graph. Then, it used a multi-task model for semantic parsing, simultaneously performing intent classification and entity extraction. Finally, it used the parsing results to generate query statements to retrieve information from the knowledge graph and answer general questions. Additionally, it combined large language models with knowledge graph to assist in processing open problems. Experimental results on datasets with associated intents and entities show that the F1 values of the adopted multi-task model in intent classification and entity recognition tasks are 0.958 and 0.937, respectively, surpassing other comparison models and single-task models. The Cypher generation test demonstrates the effectiveness of the custom Prompt in stimulating the emergent abilities of large language models. The accuracy of text-generated Cyphers using large language models reaches 85.8%, effectively handling open questions based on knowledge graph. Additionally, the accuracy of the question answering system built with knowledge graph, traditional model and large language models is 0.935, which well meets the needs of intelligent question and answer.

Key words: knowledge graph, multi-task model, intent classification, named entity recognition, large language models