Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

Dual-layer Fusion Knowledge Reasoning with Enhanced Multi-modal Features

JING Boxiang, WANG Hairong, WANG Tong, YANG Zhenye   

  1. 1. School Computer Science and Engineering of North Minzu University, Yinchuan 750021, China
    2. The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China

多模态特征增强的双层融合知识推理方法

荆博祥, 王海荣, 王彤, 杨振业   

  1. 1. 北方民族大学 计算机科学与工程学院, 银川 750021
    2. 北方民族大学 图像图形智能处理国家民委重点实验室, 银川 750021

Abstract: Most of the existing multi-modal knowledge reasoning methods use splicing or attention to directly fuse the multi-modal features extracted from the pre-trained model, often ignoring the heterogeneity and interaction complexity between different modes. Therefore, a two-layer fusion knowledge inference method with multi-modal feature enhancement is proposed. The structural information embedding module uses adaptive graph attention mechanism to filter and aggregate key neighbor information to enhance the semantic representation of entity and relationship embedding. The multi-modal embedding information module uses different attention mechanisms to pay attention to the unique features of different modal data and the common features among the multi-modal data, and uses the complementary information of the common features to carry out modal interaction, so as to reduce the heterogeneity difference between modes. The multi-modal feature fusion module adopts a two-layer fusion strategy combining low-rank multi-modal feature fusion and decision fusion to realize the dynamic and complex interaction of multi-modal data between and within modes, and comprehensively considers the contribution degree of each mode in inference to obtain more comprehensive prediction results. To verify the effectiveness of the proposed method, experiments were carried out on the FB15K-237, DB15K and YAGO15K data sets, respectively. The results show that compared with the multi-modal reasoning method, MRR and Hits@1 on FB15K-237 data sets are respectively improved by 3.6% and 2.2%. Compared with the single-modal inference method, MRR and Hits@1 have an average improvement of 13.7% and 14.6%, respectively.

Key words: multi-modal knowledge graph, link prediction, knowledge reasoning, multi-modal feature fusion

摘要: 现有的多模态知识推理方法大多采用拼接或注意力的方式,将预训练模型提取到的多模态特征直接进行融合,往往忽略了不同模态之间的异构性和交互的复杂性。为此,提出了一种多模态特征增强的双层融合知识推理方法,结构信息嵌入模块采用自适应图注意力机制筛选并聚合关键的邻居信息,用来增强实体和关系嵌入的语义表达;多模态嵌入信息模块使用不同的注意力机制关注不同模态数据的独有特征,以及多模态数据间的共性特征,利用共性特征的互补信息进行模态交互,以减少模态间异构性差异;多模态特征融合模块采用将低秩多模态特征融合和决策融合相结合的双层融合策略,实现了多模态数据在模态间和模态内的动态复杂交互,并综合考虑每种模态在推理中的贡献度,得到更全面的预测结果。为了验证方法的有效性,分别在FB15K-237、DB15K和YAGO15K数据集上进行了实验,结果表明,本方法相比多模态推理方法,在FB15K-237数据集上MRR和Hits@1分别平均提升3.6%和2.2%;相比单模态推理方法,MRR和Hits@1分别平均提升13.7%和14.6%。

关键词: 多模态知识图谱, 链接预测, 知识推理, 多模态特征融合