Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (12): 3279-3289.DOI: 10.3778/j.issn.1673-9418.2508057

• Special Issue on Theory and Technology of Multimodal Large Language Model • Previous Articles     Next Articles

Cross-Modal Molecule Retrieval Based on Molecular Structure and Curriculum Learning

LIN Nankai, WU Yiqian, HUANG Lini, WU Hongyan, XU Zhen, WANG Lianxi   

  1. 1. School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, China
    2. College of Computer, National University of Defense Technology, Changsha 410073, China
  • Online:2025-12-01 Published:2025-12-01

基于分子结构与课程学习的跨模态分子检索

林楠铠,伍奕倩,黄丽霓,武洪艳,徐榛,王连喜   

  1. 1. 广东外语外贸大学 信息科学与技术学院,广州 510006
    2. 国防科技大学 计算机学院,长沙 410073

Abstract: In the fields of drug discovery and materials science, pharmacologists often need to establish connections between molecular structures and textual descriptions in order to efficiently screen novel compounds and accelerate molecular design. In recent years, cross-modal retrieval methods have developed rapidly, with early approaches mainly relying on statistical analysis or hashing to associate different modalities, while the introduction of deep learning has significantly improved alignment performance. However, alongside the enhanced alignment capability, existing methods also incur substantial training overhead. To address this issue, some studies have introduced curriculum learning to improve retrieval efficiency. Yet, the difficulty metrics of samples in these approaches are often based on representational similarity, overlooking molecular structural information, which may lead to biased ranking results. To tackle this problem, this paper proposes a structure-aware curriculum learning framework that defines sample difficulty from the perspective of molecular-level structures. This enables the curriculum learning process to better capture intrinsic structural information, reduce the interference of misjudged samples during training, and enhance model robustness when handling molecules that are structurally similar but have divergent representations. Experimental results on multiple benchmark datasets demonstrate that the proposed method consistently surpasses existing approaches across Hits@1, MRR, and other relevant evaluation aspects. In both text-to-molecule and molecule-to-text retrieval tasks, the state-of-the-art model achieves more substantial performance gains under the proposed framework compared with existing curriculum learning methods, validating the universality and effectiveness of the proposed strategy in bidirectional retrieval scenarios.

Key words: cross-modal molecule retrieval, curriculum learning, structure-aware, molecular representation, sample difficulty measurement

摘要: 在药物研发与材料科学领域,药理学家往往需要在分子结构与文本描述之间建立联系,以便高效筛选新化合物并加速分子设计过程。近年来,跨模态检索方法快速发展,早期主要依赖统计分析或哈希映射来关联不同模态,而深度学习的引入则显著提升了对齐效果。然而,在提高对齐能力的同时,现有方法也带来了较高的训练开销。为此,一些研究引入课程学习以提升检索效率,但其样本的难度度量多依赖于表征相似度,忽视了分子结构信息,从而可能导致排序结果存在偏差。针对上述问题,提出了一种结构感知的课程学习框架,从分子层级结构出发定义样本难度,使课程学习过程能够更好地捕捉分子的内在结构信息,减少误判样本在训练过程中的干扰,增强模型在处理结构相似但表征差异较大的分子样本时的鲁棒性。在多个主流基准数据集上的实验结果表明,该方法在Hits@1、MRR等指标上均优于现有方法。在文本-分子与分子-文本两类检索任务中,现有SOTA模型在所提框架下均取得了比现有课程学习方法更明显的性能提升,验证了所提策略在双向检索中的普适性与有效性。

关键词: 跨模态分子检索, 课程学习, 结构感知, 分子表征, 样本难度度量