Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (11): 2901-2911.DOI: 10.3778/j.issn.1673-9418.2406054

• Special Issue on Constructions and Applications of Large Language Models in Specific Domains • Previous Articles     Next Articles

Construction Method of Textbook Knowledge Graph Based on Multimodal and Knowledge Distillation

LIU Jun, LENG Fangling, WU Wangwang, BAO Yubin   

  1. 1. Information Construction and Network Security Office, Northeastern University, Shenyang 110819, China
    2. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
  • Online:2024-11-01 Published:2024-10-31

基于多模态和知识蒸馏的教材知识图谱构建方法

刘军,冷芳玲,吴旺旺,鲍玉斌   

  1. 1. 东北大学 信息化建设与网络安全办公室,沈阳 110819
    2. 东北大学 计算机科学与工程学院,沈阳 110169

Abstract: In order to efficiently construct a multimodal subject knowledge graph in the field of education, a textbook text entity relationship extraction algorithm based on large model knowledge distillation and multi-model collaborative reasoning is proposed. During the model training phase, this paper uses a closed source model with 100 billion parameters to annotate text data and achieve implicit knowledge distillation. Then, this paper fine-tunes the domain data instructions for the open-source billion scale parameter model to enhance the instruction compliance ability of the entity relationship extraction task of the open-source model. In the model inference stage, the closed source model serves as the guiding model, and the open-source billion scale parameter model serves as the execution model. Experimental results show that knowledge distillation, multi-model collaboration, and domain data instruction fine-tuning are effective, significantly improving the effectiveness of textbook text entity relationship extraction tasks based on instruction prompts. A multimodal named entity recognition algorithm for textbook diagrams with explicit and implicit knowledge enhancement has been proposed. Firstly, this paper uses techniques such as image OCR (optical character recognition) and visual language modeling to extract textual information and global content description information from textbook diagrams. Then, by using explicit knowledge base retrieval and implicit LLM hint enhancement methods, auxiliary knowledge that may be associated with image title pairs is obtained. The knowledge obtained from explicit knowledge base and implicit LLM is further fused to form the final auxiliary knowledge. Finally, the auxiliary knowledge of the schematic diagram is combined with the schematic diagram title to achieve multimodal named entity recognition of the textbook schematic diagram title. Experimental results show that the algorithm is advanced and the interpretability of the algorithm is enhanced.

Key words: large language model, disciplinary knowledge graph, entity relationship extraction, multimodal named entity recognition, knowledge distillation

摘要: 为了高效构建教育领域多模态学科知识图谱,提出了基于大模型知识蒸馏和多模型协作推理的教材文本实体关系抽取算法。在模型训练阶段,利用闭源的千亿参数模型对文本数据进行标注,实现隐式知识蒸馏。然后对开源十亿规模参数模型进行领域数据指令微调,提升开源模型实体关系抽取任务的指令遵循能力。在模型推理阶段,闭源模型作为指导模型,开源的十亿规模参数模型作为执行模型。实验结果表明知识蒸馏、多模型协作、领域数据指令微调具有有效性,显著提高了基于指令提示的教材文本实体关系抽取任务的效果。提出了显隐式知识增强的教材示意图多模态命名实体识别算法。利用图像OCR、视觉语言模型等技术提取了教材示意图中的文字信息、全局内容描述信息。通过显式知识库检索增强和隐式LLM提示增强的方法,得到图像-标题对中可能关联的辅助知识,并将显式知识库和隐式LLM得到的知识进一步融合,形成最终的辅助知识。将示意图辅助知识和示意图标题进行拼接,实现教材示意图标题的多模态命名实体识别。实验结果表明,该算法具有先进性,同时增强了算法的可解释性。

关键词: 大语言模型, 学科知识图谱, 实体关系抽取, 多模态命名实体识别, 知识蒸馏