Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (10): 2656-2667.DOI: 10.3778/j.issn.1673-9418.2406013

• Special Issue on Constructions and Applications of Large Language Models in Specific Domains • Previous Articles     Next Articles

Research on Construction and Application of Knowledge Graph Based on Large Language Model

ZHANG Caike, LI Xiaolong, ZHENG Sheng, CAI Jiajun, YE Xiaozhou, LUO Jing   

  1. 1. China Nuclear Power Operation Technology Corporation, Ltd., Wuhan 430223, China
    2. College of Science, China Three Gorges University, Yichang, Hubei 443002, China
  • Online:2024-10-01 Published:2024-09-29

基于大语言模型的知识图谱构建及应用研究

张才科,李小龙,郑胜,蔡家骏,叶小舟,罗静   

  1. 1. 中核武汉核电运行技术股份有限公司,武汉 430223
    2. 三峡大学 理学院,湖北 宜昌 443002

Abstract: Massive amounts of operational and maintenance (O&M) data from nuclear power distributed control system (DCS) contain rich operational experience and expert knowledge. Effectively extracting DCS alarm response information and forming knowledge service is a current hotspot and frontier research area in rapid DCS response. Due to the lack of clear structure and standards in multi-source heterogeneous data of nuclear power DCS, previous knowledge extraction primarily relied on manual annotation and deep learning methods, which require extensive domain knowledge and information processing capabilities and are constrained by the heavy workload of data annotation. Therefore, this study proposes a knowledge extraction method using large language model (LLM) with a step-by-step prompting strategy, constructing a DCS O&M knowledge graph (KG). Based on large language model technology and secondary intent recognition methods, intelligent question and answer (Q&A) and other knowledge services are developed utilizing the knowledge graph. Using O&M data from a nuclear power plant’s DCS as a case study, the research focuses on knowledge extraction, knowledge graph construction, and intelligent Q&A. The results show that the model achieves an overall precision (P) of 91.24%, recall (R) of 85.85%, and F1-score of 88.43%. The proposed method can comprehensively capture key entities and attribute information from multi-source heterogeneous DCS O&M data, guiding domain knowledge Q&A, assisting O&M personnel in timely responding to DCS alarm anomalies, analyzing fault causes and response strategies, and providing guidance for DCS O&M training and maintenance in power plants.

Key words: nuclear power distributed control system, knowledge graph, large language model, knowledge extraction, intelligent question and answer (Q&A)

摘要: 海量核电分布式控制系统(DCS)运维数据蕴含着丰富的运维经验和专家知识,如何有效地从中抽取DCS报警响应相关信息并形成知识服务,是目前核电DCS快速响应的热点和前沿研究。由于核电DCS多源异构数据缺乏明确的结构和规范,以往的知识抽取主要依赖人工标注和深度学习的方式进行,但需要具备广泛的领域知识和信息处理能力,且受限于繁重的数据标注工作。提出了分步提示策略的大语言模型知识抽取方法,构建了DCS运维知识图谱;并基于大语言模型技术和二次意图识别方法,利用知识图谱开展智能问答等知识服务。通过以某核电厂DCS运维数据为例,重点就知识抽取、图谱构建、智能问答开展实例研究。结果表明,模型的总体精确率、召回率和F1值分别为91.24%、85.85%和88.43%,能够较为全面地获取DCS多源异构运维数据中的关键实体及属性信息,指导开展领域知识问答,有助于运维人员及时响应DCS报警异常,分析总结故障原因及响应策略,为后期的电厂DCS运维的培训和维护提供借鉴和参考。

关键词: 核电分布式控制系统, 知识图谱, 大语言模型, 知识抽取, 智能问答