Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

Recommendation Method Based on Cross Modal Graph Masking and Feature Enhancement

JING Li,  ZHENG Gonghao,  LI Xiaohan,  YU Mengyuan   

  1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China

基于跨模态图掩码和特征增强的推荐方法

景丽,郑公浩,李晓涵,蔚梦媛   

  1. 河南财经政法大学 计算机与信息工程学院, 郑州 450046

Abstract: In order to solve the problems of data noise and insufficient expression of multimodal information in traditional multimodal recommendation systems, a recommendation method based on cross-modal graph masking and feature enhancement is proposed in this paper. This method first uses a CLIP pre-trained model to extract semantically consistent text and visual features, and establishes a item-item graph based on similarity of item's multimodal features to provide semantically rich contextual information for item-representation; Secondly, the cross-modal graph mask reconstruction method is designed which fully uses inter-modality features information to reduce data noise and enhance features, and then the graph convolutional network is used to learn user and item interaction information; in the final prediction, using user preference scores can more accurately capture the user's preferences for the target item. By weighting the preference scores from different aspects, it can more effectively complete recommended tasks. Finally, a multi-task training strategy is employed to jointly optimize model learning, simultaneously considering multimodal information and interactive information, Effectively improves the accuracy of multimodal recommendations. The experimental results on the four public datasets show that the recommendation accuracy of our model is superior to benchmark algorithms in terms of Recall and normalized discounted cumulative gain (NDCG) metrics. For example it improves by 8.03% on the Recall@20 metric and 7.30% on the NDCG@20 metric compared to the most advanced baseline.

Key words: Graph Representation Learning, Graph Neural Networks, Multimodal Recommendation System, Self-Supervised Learning

摘要: 提出了一种基于跨模态图掩码的和特征增强的推荐方法,旨在减少传统多模态推荐系统中的数据噪声和解决多模态信息表达不足问题。该方法首先使用CLIP预训练模型提取语义一致性的文本和视觉特征,根据项目模态特征相似性建立项目-项目图,为项目表示提供语义丰富的上下文信息;其次,设计跨模态图掩码重建方法,该方法充分利用模态间特征信息,减少数据噪声并增强特征,之后使用图卷积网络学习用户和项目交互信息;在最终预测时,使用用户偏好分数更准确地捕捉用户对目标项目的偏好,通过不同方面的偏好分数加权计算,更为有效地完成推荐任务。最后,使用多任务联合学习框架进行模型训练,同时兼顾了多模态信息和交互信息,有效提升了多模态推荐的准确性。在Women Clothing,Men Clothing,Beauty,Toys & Games四个公共数据集实验结果表明,所提出的方法在召回率(Recall)和归一化折损累计增益(normalized discounted cumulative gain,NDCG)指标上均优于现有的基准算法,例如,在Recall@20指标上,比最先进的基线提升8.03%,在NDCG@20指标上,比最先进的基线提升7.30%。

关键词: 表示学习, 多模态推荐系统, 图神经网络, 自监督学习