计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (6): 1637-1647.DOI: 10.3778/j.issn.1673-9418.2311098

• 人工智能·模式识别 • 上一篇    下一篇

基于大语言模型的水工程调度知识图谱的构建与应用

冯钧,畅阳红,陆佳民,唐海麟,吕志鹏,邱钰淳   

  1. 河海大学 计算机与软件学院,南京 211100
  • 出版日期:2024-06-01 发布日期:2024-05-31

Construction and Application of Knowledge Graph for Water Engineering Scheduling Based on Large Language Model

FENG Jun, CHANG Yanghong, LU Jiamin, TANG Hailin, LYU Zhipeng, QIU Yuchun   

  1. College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China
  • Online:2024-06-01 Published:2024-05-31

摘要: 随着水利事业的发展和信息化需求的增加,处理和表示海量水利数据变得复杂而繁琐。特别是调度文本数据通常以自然语言的形式存在,缺乏明确的结构和规范,并且处理和应用这些多样性的数据需要具备广泛的领域知识和专业背景。为此,提出了基于大语言模型的水工程调度知识图谱的构建方法。通过数据层的调度规则数据收集与预处理,再利用大语言模型挖掘和抽取数据中蕴藏的知识,完成概念层本体构建和实例层“三步法”提示策略抽取。在数据层、概念层、实例层的相互作用下,实现了规则文本的高性能抽取,完成了数据集和知识图谱的构建。实验结果表明,大语言模型抽取方法F1值达到85.5%,且通过消融实验验证了模型各模块的有效性和合理性。构建的水工程调度知识图谱整合了分散的水利规则信息,有效处理非结构化文本数据,并提供可视化查询和功能追溯功能。这有助于领域从业人员判断来水情况并选择适当的调度方案,为水利决策和智能推荐等提供了重要支持。

关键词: 知识图谱, 大语言模型(LLM), 本体构建, 知识抽取, 水工程调度

Abstract: With the growth of water conservancy and the increasing demand for information, handling and representing large volumes of water-related data has become complex. Particularly, scheduling textual data often exists in natural language form, lacking clear structure and standardization. Processing and utilizing such diverse data necessitates extensive domain knowledge and professional expertise. To tackle this challenge, a method based on large language model has been proposed to construct a knowledge graph for water engineering scheduling. This approach involves collecting and preprocessing scheduling rule data at the data layer, leveraging large language models to extract embedded knowledge, constructing the ontology at the conceptual layer, and extracting the “three-step” method prompt strategy at the instance layer. Under the interaction of the data, conceptual, and instance layers, high-performance extraction of rule texts is achieved, and the construction of the dataset and knowledge graph is completed. Experimental results show that the F1 value of the extraction method in this paper reaches 85.5%, and the effectiveness and rationality of the modules of the large language model are validated through ablation experiments. This graph integrates dispersed water conservancy rule information, effectively handles unstructured textual data, and offers visualization querying and functionality tracing. It aids professionals in assessing water conditions and selecting appropriate scheduling schemes, providing valuable support for conservancy decision-making and intelligent reasoning.

Key words: knowledge graph, large language model (LLM), ontology construction, knowledge extraction, water engineering scheduling