双线性聚合残差注意力的细粒度图像分类模型

doi:10.3778/j.issn.1673-9418.2010031

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 938-949.DOI: 10.3778/j.issn.1673-9418.2010031

双线性聚合残差注意力的细粒度图像分类模型

李宽宽, 刘立波⁺()

宁夏大学信息工程学院,银川 750021

收稿日期:2020-10-12 修回日期:2021-01-07 出版日期:2022-04-01 发布日期:2021-02-04
通讯作者: + E-mail: liulib@163.com
作者简介:李宽宽（1995—）,男,河北石家庄人,硕士研究生,主要研究方向为图像处理、计算机视觉。
刘立波（1974—）,女,宁夏银川人,博士,教授,硕士生导师,主要研究方向为图形图像与智能信息处理。
基金资助:
宁夏自然科学基金(2020AAC03031);国家自然科学基金(61862050);西部一流大学科研创新项目(ZKZD2017005)

Fine-Grained Image Classification Model Based on Bilinear Aggregate Residual Attention

LI Kuankuan, LIU Libo⁺()

School of Information Engineering, Ningxia University, Yinchuan 750021, China

Received:2020-10-12 Revised:2021-01-07 Online:2022-04-01 Published:2021-02-04
About author:LI Kuankuan, born in 1995, M.S. candidate. His research interests include image processing and computer vision.
LIU Libo, born in 1974, Ph.D., professor, M.S. supervisor. Her research interests include graphics and intelligent information processing.
Supported by:
Natural Science Foundation of Ningxia(2020AAC03031);National Natural Science Foundation of China(61862050);Scientific Research Innovation Project of First-Class Western Universities(ZKZD2017005)

摘要/Abstract

摘要：

针对细粒度图像分类任务中种类间局部信息差异性较小,通常会导致模型表征能力不足,特征通道之间的相互依赖关系较差以及无法有效捕捉到显著且多样化的特征信息等问题,提出了一种双线性聚合残差注意力网络（BARAN）。首先在原双线性卷积网络模型（B-CNN）基础上,把原有特征提取子网络转变为更具学习能力的聚合残差网络,来提升网络的特征捕获能力;然后在每一聚合残差块内嵌入分散注意力模块,使得网络专注于整合跨维度特征,强化特征获取过程中通道之间的紧密关联程度;最终将融合的双线性特征图输入到互通道注意力模块中,利用互通道注意力模块包含的判别性与区分性两个子组件进一步学习到更加细微、多样化且互斥的局部类间易混淆信息。实验结果表明,该方法在CUB-200-2011、FGVC-Aircraft和Stanford Cars三个细粒度图像数据集上分类精度分别达到87.9%、92.9%、94.7%,性能优于大多数主流模型方法,并且相比原B-CNN模型提升幅度分别达到了0.038、0.088、0.034。

关键词: 细粒度图像分类, 聚合残差, 分散注意力, 互通道注意力, 多样化特征

Abstract:

Due to diversity in local information between categories is relatively subtle in fine-grained image classification tasks, it often causes problems such as insufficient ability of the model to capture discriminative features, and poor interdependence between channels when extracting features. As a result, the network cannot learn the salient and diverse image category features, which ultimately affects the classification performance. Therefore, this paper proposes a bilinear aggregate residual attention network (BARAN). In order to improve the feature capture ability of the network, firstly, based on the original bilinear convolutional neural networks model (B-CNN), the original feature extraction sub-network is transformed into a more learning aggregate residual network. And then, a distraction module is embedded in each aggregate residual block, so that the network focuses on integrating cross-dimensional features, and strengthens the degree of close association between channels in the feature acquisition process. Finally, the fused bilinear feature map is input into the cross-channel attention module, and the discriminative and distinctive sub-components included in the cross-channel attention module are used to further learn more subtle, diverse and mutually exclusive local inter-classes confusing information. Experimental results show that the classification accuracy on the fine-grained image datasets of CUB-200-2011, FGVC-Aircraft and Stanford Cars is 87.9%, 92.9% and 94.7%, which is superior to primary mainstream methods in classification performance. Moreover, the improvement is 0.038, 0.088 and 0.034 compared with the original B-CNN model.

Key words: fine-grained image classification, aggregate residuals, distracting attention, cross-channel attention, diversified feature

中图分类号:

TP391.4

李宽宽, 刘立波. 双线性聚合残差注意力的细粒度图像分类模型[J]. 计算机科学与探索, 2022, 16(4): 938-949.

LI Kuankuan, LIU Libo. Fine-Grained Image Classification Model Based on Bilinear Aggregate Residual Attention[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 938-949.

图/表 14

图1 聚合残差转换子模块

Fig.1 Aggregated residual transformations block

图2 双线性聚合残差注意力网络架构

Fig.2 Bilinear aggregate residual attention network

图3 分散注意力模块位置图

Fig.3 Location figure of split attention model

图4 分散注意力模块图

Fig.4 Split attention model

图5 互通道注意力模块分支图

Fig.5 Branch of MCA model

图6 判别性子模块

Fig.6 Discrimination submodel

图7 区分性子模块

Fig.7 Distinctive submodel

表1 数据集的训练集与测试集信息

Table 1 Datasets information of training and testing

Datasets	Category	Training	Testing
CUB-200-2011	200	5 994	5 794
FGVC-Aircraft	100	6 667	3 333
Stanford Cars	196	8 144	8 041

图8 3个数据集中训练集的数据增强示例

Fig.8 Data augmentation examples of training set in 3 datasets

表2 本模型512维的 ξ通道分配图

Table 2 ξvalue assignment using BARAN with 512 feature channels

Datasets	cnums/cgroups
CUB-200-2011	2/88 3/112
FGVC-Aircraft	5/100 6/2
Stanford Cars	2/76 3/120

表3 嵌入SA模块的ResneXt在不同基数下的实验对比

Table 3 Experimental comparison between SA module and ResneXt under different cardinality

Method	Base model	Params/10⁶	Accuracy/%
B-CNN[M,D]	VGG16-M+VGG16-D	13.8	84.1
BARN[2×64d]	ResneXt29×2+SA	34.8	84.8
BARN[4×64d]	ResneXt29×2+SA	34.6	85.2
BARN[8×64d]	ResneXt29×2+SA	34.4	85.5
BARN[32×4d]	ResneXt29×2+SA	18.2	85.9

表4 MCA模块不同组件部分的消融实验对比

Table 4 Ablation experiment of different components of MCA module

Method	Base model	Accuracy/%
Method	Base model	CUB-200-2011	FGVC-Aircraft	Stanford Cars
BARN+MCA (CWA)	ResneXt29×2+SA	63.85	88.79	89.87
BARN+MCA $(L_{qsm})$	Resnext29×2+SA	27.35	79.88	70.23
BARN+MCA $(L_{psm})$	ResneXt29×2+SA	65.07	88.28	90.04
BARN+MCA $(L_{psm} + L_{qsm})$	ResneXt29×2+SA	66.47	89.90	91.34

表4 MCA模块不同组件部分的消融实验对比

Table 4 Ablation experiment of different components of MCA module

Method	Base model	Accuracy/%
Method	Base model	CUB-200-2011	FGVC-Aircraft	Stanford Cars
BARN+MCA (CWA)	ResneXt29×2+SA	63.85	88.79	89.87
BARN+MCA $(L_{qsm})$	Resnext29×2+SA	27.35	79.88	70.23
BARN+MCA $(L_{psm})$	ResneXt29×2+SA	65.07	88.28	90.04
BARN+MCA $(L_{psm} + L_{qsm})$	ResneXt29×2+SA	66.47	89.90	91.34

表5 不同弱监督细粒度图像分类方法实验对比

Table 5 Experimental comparison of different weakly supervised fine-grained image classification methods

Method	Base model	Accuracy/%
Method	Base model	CUB-200-2011	FGVC-Aircraft	Stanford Cars
B-CNN^[15]	VGG16	84.1	84.1	91.3
MaxEnt^[22]	B-CNN	85.3	86.1	92.8
PC^[23]	B-CNN	85.6	85.8	92.5
PC^[23]	DenseNet161	86.9	89.2	92.9
MA-CNN^[24]	VGG19	86.5	89.9	92.8
DFL-CNN^[25]	ResNet50	87.4	91.7	93.9
NTS-Net^[5]	ResNet50	87.5	91.4	93.9
TASN^[26]	ResNet50	87.9	—	93.8
DCL^[27]	VGG16	86.9	91.2	94.1
WPS-CPM^[28]	GoogleNet + ResNet50	90.4	—	—
Bi-Modal PMA^[29]	ResNet50	87.5	90.8	93.1
BARAN(Proposed)	B-CNN+ResneXt29	87.9	92.9	94.7

图9 各个模块在3个数据集上生成的热图对比

Fig.9 Comparison of heat maps generated by each module on 3 datasets

参考文献 31

[1]	ZHANG N, DONAHUE J, GIRSHICK R B, et al. Part-based R-CNNs for fine-grained category detection[C]// LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 834-849.
[2]	罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8):1306-1318.
	LUO J H, WU J X. A survey on fine-grained image cate-gorization using deep convolutional features[J]. Acta Auto-matica Sinica, 2017, 43(8):1306-1318.
[3]	UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171. DOI URL
[4]	LIN D, SHEN X Y, LU C W, et al. Deep LAC: deep localiza-tion, alignment and classification for fine-grained recognition[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1666-1674.
[5]	YANG Z, LUO T G, WANG D, et al. Learning to navigate for fine-grained classification[C]// LNCS 11218: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 420-435.
[6]	BORJI A, ITTI L. State-of-the-art in visual attention modeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):185-207. DOI URL
[7]	PENG Y H, HE X T, ZHAO J J. Object-part attention model for fine-grained image classification[J]. IEEE Transactions on Image Processing, 2018, 27(3):1487-1500. DOI URL
[8]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[9]	HAN K, GUO J Y, ZHANG C, et al. Attribute-aware attention model for fine-grained representation learning[C]// Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, Seoul, Oct 22-26, 2018. New York: ACM, 2018: 2040-2048.
[10]	GAO Y, HAN X T, WANG X, et al. Channel interaction networks for fine-grained image categorization[C]// Procee-dings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 10818-10825.
[11]	HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):2011-2023. DOI URL
[12]	LI X, WANG W H, HU X L, et al. Selective kernel networks[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 510-519.
[13]	ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: split-attention networks[J]. arXiv: 2004. 08955, 2020.
[14]	CHANG D L, DING Y F, XIE J Y, et al. The devil is in the channels: mutual-channel loss for fine-grained image classi-fication[J]. IEEE Transactions on Image Processing, 2020, 29:4683-4695. DOI URL
[15]	LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNNs for fine-grained visual recognition[J]. arXiv: 1504. 07889, 2015.
[16]	XIE S N, GIRSHICK R B, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// Pro-ceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Wash-ington: IEEE Computer Society, 2017: 5987-5995.
[17]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778.
[18]	PASZKE A, GROSS S, CHINTALA S, et al. Automatic differentiation in PyTorch[C]// Proceedings of the 31st Con-ference on Neural Information Processing System, Long Beach, Oct 28, 2017. Red Hook: Curran Associates, 2017: 1-4.
[19]	WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset[R]. Pasadena: California Ins-titute of Technology, 2011.
[20]	MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[J]. arXiv: 1306. 5151, 2013.
[21]	KRAUSE J, STARK M, DENG J, et al. 3D object represen-tations for fine-grained categorization[C]// Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 554-561.
[22]	DUBEY A, GUPTA O, RASKAR R, et al. Maximum-entropy fine grained classification[C]// Proceedings of the Annual Con-ference on Neural Information Processing Systems, Montréal, Dec 3-8, 2018: 635-645.
[23]	DUBEY A, GUPTA O, GUO P, et al. Pairwise confusion for fine-grained visual classification[C]// LNCS 11216: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 71-88.
[24]	ZHENG H L, FU J L, MEI T, et al. Learning multi-attention convolutional neural network for fine-grained image reco-gnition[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 5219-5227.
[25]	WANG Y M, MORARIU V I, DAVIS L S. Learning a disc-riminative filter bank within a CNN for fine-grained reco-gnition[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 4148-4157.
[26]	ZHENG H L, FU J L, ZHA Z J, et al. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5012-5021.
[27]	CHEN Y, BAI Y L, ZHANG W, et al. Destruction and con-struction learning for fine-grained image recognition[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5157-5166.
[28]	GE W F, LIN X R, YU Y Z. Weakly supervised comple-mentary parts models for fine-grained image classification from the bottom up[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3034-3043.
[29]	SONG K T, WEI X S, SHU X B, et al. Bi-modal progres-sive mask attention for fine-grained recognition[J]. IEEE Transactions on Image Processing, 2020, 29:7006-7018. DOI URL
[30]	SELVARAJU R R, COGSWELL M, DAS A, et al. GRAD-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE Inter-national Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626.
[31]	杨萌林, 张文生. 分类激活图增强的图像分类算法[J]. 计算机科学与探索, 2020, 14(1):149-158.
	YANG M L, ZHANG W S. Image classification algorithm based on classification activation map enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(1):149-158.

编辑推荐

Metrics

阅读次数

全文

184

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	13	29	0	142

来源	本网站	其他网站

次数	170	14
比例	92%	8%

摘要

双线性聚合残差注意力的细粒度图像分类模型

Fine-Grained Image Classification Model Based on Bilinear Aggregate Residual Attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 31

相关文章 1

编辑推荐

Metrics