计算机科学与探索

• 学术研究 •    

细粒度视觉分类:深度成对特征对比交互算法

汪敏,赵鹏,郭鑫平,闵帆   

  1. 1.西南石油大学 电气信息学院,成都 610500
    2.西南石油大学 计算机科学学院,成都 610500
    3.西南石油大学 人工智能研究所,成都  610500

Fine-Grained Visual Categorization: Deep Pairwise Feature Comparison Interaction Algorithm

WANG Min, ZHAO Peng, GUO Xinping, MIN Fan   

  1. 1.School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu 610500, China
    2.School of Computer Science, Southwest Petroleum University, Chengdu 610500, China
    3.Institute for Artificial Intelligence, Southwest Petroleum University, Chengdu 610500, China

摘要: 由于高类内和低类间方差,细粒度图像识别成为计算机视觉领域一项极具挑战性的研究课题。经典的细粒度图像识别方法采用单输入单输出的方式,限制了模型从成对图像中对比学习推理的能力。受人类在判别细粒度图像时的行为启发,提出了深度成对特征对比交互细粒度分类算法(PCI),深度对比寻找图像对之间的共同、差异特征,有效提升细粒度识别精度。首先,PCI建立正负对输入策略,提取细粒度图像的成对深度特征。其次,建立深度成对特征交互机制,实现成对深度特征的全局信息学习、深度对比以及深度自适应交互。最后,建立成对特征对比学习机制,通过对比学习约束成对深度细粒度特征,增大正对之间的相似性并减小负对之间的相似性。在流行的细粒度数据集CUB-200-2011、Stanford Dogs、Stanford Cars以及FGVC-Aircraft上开展了广泛的实验,实验结果表明PCI的性能优于当前的SOTA方法。

关键词: 细粒度, 图像分类, 深度神经网络, 对比学习, 注意力机制

Abstract: Fine-grained visual categorization is an important but challenging task in computer vision due to high intra class and low inter-class variance. Classical fine-grained image recognition methods use a single-input with single-output approach, which limits the ability of the model to learn inference from paired images. Inspired by the behavior of human beings when discriminating fine-grained images, a deep pairwise feature comparison interactive fine-grained classification algorithm (PCI) is proposed to find common or different features between image pairs and effectively improve the fine-grained recognition accuracy. First, PCI establishes a positive-negative pair input strategy to extract pairwise depth features of fine-grained images. Secondly, a deep pairwise feature interaction mechanism is established to realize global information learning, depth comparison and depth adaptive interaction of paired depth features. Finally, a pairwise feature contrastive learning mechanism is established to constrain pairwise deep fine-grained features through contrastive learning, increasing the similarity between positive pairs and reducing the similarity between negative pairs. Extensive experiments are conducted on the popular fine-grained datasets CUB-200-2011, Stanford Dogs, Stanford Cars, and Aircraft, and the experimental results show that PCI outperforms current SOTA methods.

Key words: fine-grained, image classification, deep neural network, contrastive learning, attention mechanism