计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (1): 149-158.DOI: 10.3778/j.issn.1673-9418.1902025

• 图形图像 • 上一篇    下一篇

分类激活图增强的图像分类算法

杨萌林,张文生   

  1. 1.中国科学院 自动化研究所 精密感知与控制研究中心,北京 100190
    2.中国科学院大学 人工智能学院,北京 100049
  • 出版日期:2020-01-01 发布日期:2020-01-09

Image Classification Algorithm Based on Classification Activation Map Enhancement

YANG Menglin, ZHANG Wensheng   

  1. 1.Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-01-01 Published:2020-01-09

摘要: 分类激活图(CAM)具有稀疏、不连续、不完整等问题,并且目前大部分研究仅将其用于可视化分析。基于此,首先利用扩张卷积设计了自动加权的多尺度特征学习来弥补分类激活图存在的问题,并将该多尺度特征与分类激活图生成方法结合,设计了多尺度分类激活图生成方法。进一步,将该多尺度的分类激活图嵌入到网络中构成了端到端的结构,实现分类性能增强的目的。以残差网络ResNet为骨干网络,提出了分类增强模型ResNet-CE。在三个公开数据集CIFAR10、CIFAR100和STL10上,对该模型进行了大量的实验。实验表明:ResNet-CE在这三个数据集上的分类性能与参数量相当的ResNet相比有明显的提升,识别的错误率分别降低了0.23%、3.56%和7.96%,并且分类性能优于当前大部分的分类网络。提出的算法能够简单地迁移到已有的分类模型中,提高原有模型的分类性能。同时,该算法保留了对模型判断依据可视化和解释的功能,这在医疗影像中的疾病识别、无人驾驶的场景识别等场景中具有一定的应用价值和意义。

关键词: 图像分类, 分类激活图(CAM), 多尺度, 可视化, 可解释性

Abstract: Classification activation map (CAM) has problems such as sparseness, discontinuity, incompleteness, etc.,and most of the current research only uses it for visual analysis. Based on this, this paper firstly utilizes the dilated convolution to design an automatic weighted multi-scale feature learning method in order to compensate for the defects of CAM and combines the multi-scale feature with the generation method of CAM to develop a multi-scale CAM generation method. Further, this paper embeds the multi-scale CAM into the network to form an end-to-end structure in order to enhance the classification performance. Taking the ResNet as the backbone, this paper proposes a classification enhancement model, ResNet-CE. Extensively experiments are conducted with ResNet-CE on three publicly available datasets, CIFAR10, CIFAR100 and STL10. Experiments show that the classification performance of ResNet-CE on these three datasets is significantly improved compared with the ResNet with similar parameters quantity. The error rates are reduced by 0.23%, 3.56% and 7.96%, respectively and the classification performance is better than most mainstream classification models. The proposed model can be easily transferred to the off-the-shelf model to improve its classification performance. At the same time, the algorithm retains the function of visualization and interpretation of the judgment of the model, which has certain application value and significance in scenes, such as diseases recognition in medical image and scene recognition in unmanned driving, etc.

Key words: image classification, classification activation map (CAM), multiscale, visualization, interpretability