计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (9): 2470-2478.DOI: 10.3778/j.issn.1673-9418.2409043

• 人工智能·模式识别 • 上一篇    下一篇

基于跨模态图掩码和特征增强的推荐方法

景丽,郑公浩,李晓涵,蔚梦媛   

  1. 河南财经政法大学 计算机与信息工程学院,郑州 450046
  • 出版日期:2025-09-01 发布日期:2025-09-01

Recommendation Method Based on Cross-Modal Graph Masking and Feature Enhancement

JING Li, ZHENG Gonghao, LI Xiaohan, YU Mengyuan   

  1. School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China
  • Online:2025-09-01 Published:2025-09-01

摘要: 提出了一种基于跨模态图掩码和特征增强的推荐方法,旨在减少传统多模态推荐系统中的数据噪声和解决多模态信息表达不足问题。该方法使用CLIP预训练模型提取语义一致性的文本和视觉特征,根据项目模态特征相似性建立项目-项目图,为项目表示提供语义丰富的上下文信息;设计跨模态图掩码重建方法,该方法充分利用模态间特征信息,减少数据噪声并增强特征,之后使用图卷积网络学习用户和项目交互信息;在最终预测时,使用用户偏好分数更准确地捕捉用户对目标项目的偏好,通过不同方面的偏好分数加权计算,更为有效地完成推荐任务。使用多任务联合学习框架进行模型训练,同时兼顾了多模态信息和交互信息,有效提升了多模态推荐的准确性。在Women Clothing、Men Clothing、Beauty、Toys & Games四个公共数据集上的实验结果表明,所提出的方法在召回率(Recall)和归一化折损累计增益(NDCG)指标上均优于现有的基准算法,在Recall@20指标上,比最先进的基线提升8.04%,在NDCG@20指标上,比最先进的基线提升7.31%。

关键词: 图表示学习, 多模态推荐系统, 图神经网络, 自监督学习

Abstract: In order to solve the problems of data noise and insufficient expression of multimodal information in traditional multimodal recommendation systems, a recommendation method based on cross-modal graph masking and feature enhancement is proposed in this paper. This method firstly uses a CLIP pre-trained model to extract semantically consistent text and visual features, and establishes an item-item graph based on similarity of item??s multimodal features to provide semantically rich contextual information for item-representation. Secondly, the cross-modal graph mask reconstruction method is designed which fully uses inter-modality features information to reduce data noise and enhance features, and then the graph convolutional network is used to learn user and item interaction information. In the final prediction, user preference scores are used to accurately capture the user??s preferences for the target item. By weighting the preference scores from different aspects, it can more effectively complete recommended tasks. Finally, a multi-task training strategy is employed to jointly optimize model learning, simultaneously considering multimodal information and interactive information, effectively improving the accuracy of multimodal recommendations.  Experimental results on the four public datasets Women Clothing, Men Clothing, Beauty, Toys & Games show that the recommendation accuracy of the proposed model is superior to benchmark algorithms in terms of Recall and normalized discounted cumulative gain (NDCG) metrics, with an improvement of 8.04% on the Recall@20 metric and 7.31% on the NDCG@20 metric, respectively, compared with the most advanced baseline.

Key words: graph representation learning, multimodal recommendation system, graph neural networks, self-supervised learning