计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (8): 2123-2134.DOI: 10.3778/j.issn.1673-9418.2407042

• 图形·图像 • 上一篇    下一篇

基于特征增强和对比嵌入的零样本图像分类算法

刘颖,冯小东,何敬鲁   

  1. 西安邮电大学 图像与信息处理研究中心,西安 710121
  • 出版日期:2025-08-01 发布日期:2025-07-31

Zero-Shot Image Classification Based on Feature Enhancement and Contrastive Embedding

LIU Ying, FENG Xiaodong, HE Jinglu   

  1. Center for Image and Information Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 零样本图像分类旨在利用训练过程中可见类的信息实现未见类的预测。特征生成的方法在语义特征的指导下,利用生成模型合成未见类的视觉特征,并在视觉特征空间训练一个有监督学习模型完成预测。但是,视觉特征空间缺乏足够的判别性信息,得到的分类结果不是最优的。为此,构建一个基于对比学习的对比嵌入模块,将生成的特征与真实的特征映射至对比嵌入空间,在对比嵌入空间分别进行实例嵌入与类嵌入,利用对比学习更好地学习实例之间的差异以及类之间的区别,获得更具判别性信息的特征,并最终在对比嵌入空间训练一个有监督学习模型完成预测。此外,为了充分利用视觉特征的数据分布,获得更接近真实特征及其语义信息的生成特征,利用Vision Transformer提取图像的视觉特征,并在特征生成的过程中加入双原型约束策略,利用聚类原型和类别原型帮助生成模型更好地学习数据分布。该策略分别约束生成特征接近真实特征的聚类原型以及生成特征的类别原型接近真实特征的聚类原型。在三个公共数据集上的实验结果验证了提出算法的有效性。

关键词: 零样本图像分类, 生成模型, 对比学习, 聚类原型, 类别原型

Abstract: Zero-shot image classification aims to achieve prediction of unseen classes by utilizing the information of the seen classes during the training. The generative method synthesizes visual features of unseen classes using generative model guided by semantic information and trains a supervised learning model in the visual feature space to complete the prediction. However, the visual feature space lacks sufficient discriminative information, and thus the classification results are not optimal. In order to obtain the features with more discriminative information, this paper proposes to build a contrastive embedding module based on contrastive learning to project the generated features and real features into the contrastive embedding space, performing contrastive embedding in terms of the instance-level and class-level respectively and using the contrastive learning to better learn the differences between instances as well as the differences between classes. Eventually, a supervised learning model is trained in the contrastive embedding space to complete the prediction. In addition, in order to fully utilize the data distribution of visual features and to obtain generated features that are closer to the real features and their semantic information, this paper utilizes Vision Transformer for visual feature extraction, and dual prototype constraint strategy is added to the feature generation process, utilizing clustering prototype and class prototype to help the generative model learn the data distribution better. This strategy constrains the generated features to be close to the clustering prototype of the real feature and the class prototype of the generated features to be close to the clustering prototype of the real feature. Experiments are conducted on three common datasets and the results show the effectiveness of the proposed algorithm.

Key words: zero-shot image classification, generative model, contrastive learning, clustering prototype, class prototype