计算机科学与探索

• 学术研究 •    下一篇

基于异构信息网络的多模态食谱表示学习方法

张霄雁, 江诗琪, 孟祥福   

  1. 辽宁工程技术大学 电子与信息工程学院, 辽宁 葫芦岛 125105

A Multimodal Recipe Representation Learning Method Based on Heterogeneous Information Networks

ZHANG Xiaoyan, JIANG Shiqi, MENG Xiangfu   

  1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China

摘要: 当前食谱表示学习方法主要依赖于通过将食谱文本与图像进行对齐,或利用邻接矩阵捕捉食谱与其用料之间关系的方式,学习食谱的嵌入表示。然而,这些方法在信息融合处理上较为粗糙,未能深入挖掘不同模态之间的交叉信息,且难以有效的动态评估食谱组成要素之间的关联强度,导致模型的表示能力受限。针对上述问题,本文提出一种基于异构信息网络的多模态食谱表示学习模型(CookRec2vec),将视觉、文本和关系信息集成到食谱嵌入中,通过自适应的邻接关系更加充分挖掘和量化食谱组成要素之间的关联信息及其强度,同时基于高阶共现矩阵的显式建模方法提供了互补信息且保留了原有特性,显著提高了食谱特征表达能力。实验结果表明,所提模型在食谱分类性能上优于现有主流方法,并在创新菜嵌入预测方面取得了显著进展。

关键词: 表示学习, 图嵌入, 异构信息网络, 跨模态融合, 对抗攻击, 节点分类

Abstract: Current cooking recipe representation learning methods primarily depend on aligning recipe texts with corresponding images or using adjacency matrix to capture relationships between cooking recipes and their ingredients for embedding learning. However, these methods are relatively rough in information fusion processing, fail to deeply mine the interaction information between them, and face challenges in effectively and dynamically evaluating the strength of correlations between cooking recipe components, resulting in restricting the model's representational capacity. To address these problems, this paper proposes a heterogeneous information network-based multimodal cooking recipe representation learning model (CookRec2vec) that integrates visual, textual, and relational information into cooking recipe embedding and fully mines and quantifies the correlation between the major components of the cooking recipes through adaptive adjacency relationships. At the same time, an explicit modeling approach based on high-order co-occurrence matrices provides complementary information while preserving original characteristics, which significantly improves the expression ability of cooking recipe features. Experimental results show that the proposed model outperforms the existing mainstream methods in cooking recipe classification performance and has made significant progress in the field of innovative dish embedding prediction.

Key words: representation learning, graph embedding, heterogeneous information network, cross-modal fusion, adversarial attack, node classification