Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (7): 1729-1746.DOI: 10.3778/j.issn.1673-9418.2411008

• Frontiers·Surveys • Previous Articles     Next Articles

Research on Development Status of Multimodal Knowledge Graph Fusion Technology in Medical Field

SHI Zhenpu, LYU Xiao, DONG Yanru, LIU Jing, WANG Xiaoyan   

  1. 1. School of Medical Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
    2. Information Department of Shandong Provincial Hospital of Traditional Chinese Medicine, Jinan 250014, China
  • Online:2025-07-01 Published:2025-06-30

医学领域多模态知识图谱融合技术发展现状研究

时振普,吕潇,董彦如,刘静,王晓燕   

  1. 1. 山东中医药大学 医学信息工程学院,济南 250355
    2. 山东省中医院 信息科,济南 250014

Abstract: Multimodal knowledge graph utilizes text, visual and other multimodal data to model entities, relationships and events, demonstrating powerful data processing capabilities and providing richer and deeper understanding for the field of artificial intelligence. Therefore, it has attracted attention in the medical field and has achieved significant results in various research areas such as medical data processing and potential value mining. To better clarify the research status of multimodal knowledge graph in the medical field, firstly, this paper elaborates on the basic knowledge of multimodal knowledge graph and the difficulties and related datasets in constructing multimodal knowledge graph in the medical field. Secondly, this paper analyzes the key technologies involved in multimodal knowledge graph fusion, such as multimodal entity alignment and multimodal entity linking, from the perspectives of traditional methods and deep learning methods. The focus is on the feature extraction and fusion methods of text, image, and audio modalities. This paper summarizes the  advantages and disadvantages of each multimodal fusion method, and elaborates on the application of multimodal large language model in multimodal fusion. Finally, this paper reviews the research progress of multimodal knowledge graphs in fields such as medical visual Q&A, drug development, and medical imaging diagnosis. On this basis, this paper analyzes the limitations and challenges faced by multimodal knowledge graphs in the field of medical multimodal fusion and datasets, and provides future research directions.

Key words: multimodal knowledge graph, knowledge graph fusion, multimodal large language model, intelligent healthcare

摘要: 多模态知识图谱利用文本、视觉等多模态数据对实体、关系及事件进行建模,展现出强大的数据处理能力,为人工智能领域提供更丰富、深入的理解,也因此备受医学领域瞩目,其在医学数据处理、潜在价值挖掘等多类研究中均取得显著成效。为更好地厘清多模态知识图谱在医学领域的研究现状,阐述多模态知识图谱基本知识及医学领域多模态知识图谱构建难点与相关数据集;从传统方法及深度学习方法两个角度分析多模态知识图谱融合涉及的多模态实体对齐与多模态实体链接等关键技术,重点分析文本、图像、音频三个模态的特征提取及融合方法,总结各多模态融合方法优缺点并阐述多模态大语言模型在多模态融合中的应用;详细梳理多模态知识图谱在医学视觉问答、药物研发、影像辅助诊断等领域的研究进展。在此基础上,分析归纳医学领域多模态知识图谱在多模态融合与数据集方面的局限性及面临的挑战,并对未来研究方向进行展望。

关键词: 多模态知识图谱, 知识图谱融合, 多模态大语言模型, 智能医疗