计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (11): 2940-2953.DOI: 10.3778/j.issn.1673-9418.2406057

• 垂直领域大模型构建与应用专题 • 上一篇    下一篇

融合知识推理与相似度检索的民众诉求大模型构建与应用

刘昕,高会泉,邵长恒,陈子良,卢文娟,杨会如   

  1. 1. 中国石油大学(华东) 青岛软件学院、计算机科学与技术学院,山东 青岛 266580
    2. 青岛大学 计算机科学技术学院,山东 青岛 266071
  • 出版日期:2024-11-01 发布日期:2024-10-31

Construction and Application of Large Language Model for Public Complaints with Knowledge Reasoning and Similarity Retrieval

LIU Xin, GAO Huiquan, SHAO Changheng, CHEN Ziliang, LU Wenjuan, YANG Huiru   

  1. 1. Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
    2. School of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
  • Online:2024-11-01 Published:2024-10-31

摘要: 高效回复民众诉求是实现智能化管理、提升民众满意度的必要措施,将智能问答应用于民众诉求能有效节约人力和时间资源。然而,智能问答中基于规则和检索的模型依赖预设知识,当诉求超出预设知识范围时无法提供有效回复,在处理多轮对话时也无法保持对话连贯性。现有的大语言模型可以和用户流畅对话,但通用大语言模型缺乏诉求领域知识。由于训练数据中问答对的信息没有覆盖回答用户问题所需要的知识,导致通用大语言模型生成错误回复或答非所问,产生幻觉。针对上述问题,构建了面向民众诉求领域的智能问答大语言模型(PC-LLM)。设计基于BERT-BiLSTM-CRF的实体关系抽取模型获得诉求工单中实体及其关系,进而构建诉求知识图谱,使用BERT模型对诉求工单向量化并构建诉求工单向量索引库;回复生成阶段,抽取用户诉求的实体和关系,在诉求知识图谱中通过实体链接进行知识推理,获取潜在关系提示,同时在诉求工单向量索引库内对诉求进行快速检索,获取相似诉求并构建相似诉求提示;将潜在关系提示、相似诉求提示与用户诉求融合形成综合提示,引导大语言模型生成准确的回复。实验分析显示,该大语言模型在诉求数据集中的表现明显优于ChatGPT4o、文心一言、通义千问等大语言模型。

关键词: 大语言模型, 知识推理, 相似度检索, 民众诉求, 知识图谱

Abstract: Efficiently responding to public complaints is a necessary measure to realize intelligent management and enhance public satisfaction, and the use of intelligent question answering for public complaints can save time and human resources. However, rule-based and retrieval-based models in intelligent question answering rely on preset knowledge. Therefore, they cannot provide effective responses when complaints are out of the scope of knowledge, nor can they maintain the coherence of conversations when dealing with multiple rounds of dialogues. Existing large language models can communicate smoothly with users, but general-purpose large language models lack domain knowledge. Due to the fact that the correct answers in the training data will contain information not covered by the questions, the general large language model generates wrong responses or answers that are not the questions asked, resulting in hallucination. To address these issues, a large language model (PC-LLM) for intelligent question-and-answer in the domain of public complaints has been constructed. Firstly, an entity relationship extraction model based on BERT-BiLSTM-CRF is designed to extract entities and relationships in the complaint work order in order to construct the complaint knowledge graph. The BERT model is used to vectorize the complaint work order and construct the vector index library of the complaint work order. In the stage of reply generation, this paper extracts the entities and relationships of users’ complaints, conducts knowledge reasoning through entity links in the knowledge graph of complaints, obtains potential relationship tips, and uses the knowledge graph of complaints to perform knowledge reasoning to obtain potential relationship hints. Meanwhile, this paper performs quick search of complaints within the vector index library of complaint work orders, and obtains similar complaints. Finally, a more accurate response can be generated by integrating potential relationship prompts, similar complaint prompts and complaint into a large language model. Experimental analysis shows that the performance of this large language model on the complaints dataset is significantly better than that of ChatGPT4o, ERNIE Bot, Tongyi Qianwen, and other large language models.

Key words: large language model, knowledge reasoning, similarity retrieval, public complaints, knowledge graph