计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (12): 3328-3339.DOI: 10.3778/j.issn.1673-9418.2505073

• 人工智能·模式识别 • 上一篇    下一篇

基于大模型验证增强的产业链知识图谱构建研究

郑傲泽,张坤丽,李云龙,王影,袁颂瑞,吴鹏程,贾玉祥,昝红英   

  1. 郑州大学 计算机与人工智能学院,郑州 472000
  • 出版日期:2025-12-01 发布日期:2025-12-01

Construction of Industry Chain Knowledge Graph via Large Language Model-Based Verification Enhancement

ZHENG Aoze, ZHANG Kunli, LI Yunlong, WANG Ying, YUAN Songrui, WU Pengcheng, JIA Yuxiang, ZAN Hongying   

  1. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 472000, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 随着产业链分析的深入和信息化需求的增长,如何从海量文本中高效抽取企业及产业间的供需关系并构建产业链知识图谱,成为当前研究的主要问题。基于此,提出了一种基于大语言模型验证增强的产业链知识图谱构建方法。在概念层,设计了以“产业-企业”双视角的图谱构建层级体系,实现企业与产业的映射关系。在数据层,对结构化数据利用大语言模型进行知识抽取,构建产业链标注语料库(CERIC)。对于非结构化数据,设计了一个验证增强的大模型数据抽取框架VRTE-LLM。该框架使用CERIC语料库对大语言模型微调训练,以提升其在特定领域的知识理解与信息抽取能力,并根据任务需求和文本上下文,修正关系三元组中的完整性错误与偏差错误,通过使用预定义的规则库对三元组进行验证。实验结果显示,该框架在实体识别与关系抽取任务中分别达到80.9%与83.9%的准确率。所构建的产业链知识图谱包含39 627个三元组,涵盖四大产业领域、70余家企业的知识图谱,覆盖产业链上下游各关键环节,为产业链进行量化分析、趋势预测提供技术支持。

关键词: 知识图谱构建, 命名实体识别, 关系三元组抽取, 大语言模型(LLM), 产业链

Abstract: With the deepening of industrial chain analysis and the increasing demand for information processing, efficiently extracting supply-demand relationships between enterprises and industries from massive text data and constructing industrial chain knowledge graphs have become a critical research issue. To address this challenge, this paper proposes a knowledge graph construction method for industrial chains enhanced by large language model verification. At the conceptual level, this paper designs a hierarchical architecture for graph construction from dual perspectives of “industry-enterprise” to establish mapping relationships between enterprises and industrial sectors. At the data level, structured data undergoes knowledge extraction through large language models to develop an industrial chain domain corpus for entity and relation annotation (corpus for entity and relation annotation in industrial chain domain, CERIC). For unstructured data, this paper implements a verification-augmented relation triple extraction LLM (VRTE-LLM), which employs the CERIC corpus to fine-tune the large language model, thereby enhancing domain-specific knowledge comprehension and information extraction capabilities. The framework systematically rectifies completeness errors and deviation errors in relational triples by integrating contextual analysis with task-specific requirements, while applying a pre-defined rule base for triple validation to significantly improve the accuracy of enterprise information identification from unstructured texts. Experimental results demonstrate that the framework achieves 80.9% accuracy in entity recognition and 83.9% accuracy in relation extraction. The constructed industrial chain knowledge graph comprises over 39627 triples, covering four major industrial domains and knowledge representations of more than 70 enterprises, with comprehensive mapping across critical upstream and downstream chain segments. This methodology provides technical support for quantitative analysis and trend prediction in industrial chain research.

Key words: knowledge graph construction, named entity recognition, relational triple extraction, large language model (LLM), industry chain