计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 938-949.DOI: 10.3778/j.issn.1673-9418.2010031
收稿日期:
2020-10-12
修回日期:
2021-01-07
出版日期:
2022-04-01
发布日期:
2021-02-04
通讯作者:
+ E-mail: liulib@163.com作者简介:
李宽宽(1995—),男,河北石家庄人,硕士研究生,主要研究方向为图像处理、计算机视觉。基金资助:
Received:
2020-10-12
Revised:
2021-01-07
Online:
2022-04-01
Published:
2021-02-04
About author:
LI Kuankuan, born in 1995, M.S. candidate. His research interests include image processing and computer vision.Supported by:
摘要:
针对细粒度图像分类任务中种类间局部信息差异性较小,通常会导致模型表征能力不足,特征通道之间的相互依赖关系较差以及无法有效捕捉到显著且多样化的特征信息等问题,提出了一种双线性聚合残差注意力网络(BARAN)。首先在原双线性卷积网络模型(B-CNN)基础上,把原有特征提取子网络转变为更具学习能力的聚合残差网络,来提升网络的特征捕获能力;然后在每一聚合残差块内嵌入分散注意力模块,使得网络专注于整合跨维度特征,强化特征获取过程中通道之间的紧密关联程度;最终将融合的双线性特征图输入到互通道注意力模块中,利用互通道注意力模块包含的判别性与区分性两个子组件进一步学习到更加细微、多样化且互斥的局部类间易混淆信息。实验结果表明,该方法在CUB-200-2011、FGVC-Aircraft和Stanford Cars三个细粒度图像数据集上分类精度分别达到87.9%、92.9%、94.7%,性能优于大多数主流模型方法,并且相比原B-CNN模型提升幅度分别达到了0.038、0.088、0.034。
中图分类号:
李宽宽, 刘立波. 双线性聚合残差注意力的细粒度图像分类模型[J]. 计算机科学与探索, 2022, 16(4): 938-949.
LI Kuankuan, LIU Libo. Fine-Grained Image Classification Model Based on Bilinear Aggregate Residual Attention[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 938-949.
Datasets | Category | Training | Testing |
---|---|---|---|
CUB-200-2011 | 200 | 5 994 | 5 794 |
FGVC-Aircraft | 100 | 6 667 | 3 333 |
Stanford Cars | 196 | 8 144 | 8 041 |
表1 数据集的训练集与测试集信息
Table 1 Datasets information of training and testing
Datasets | Category | Training | Testing |
---|---|---|---|
CUB-200-2011 | 200 | 5 994 | 5 794 |
FGVC-Aircraft | 100 | 6 667 | 3 333 |
Stanford Cars | 196 | 8 144 | 8 041 |
Datasets | cnums/cgroups |
---|---|
CUB-200-2011 | 2/88 3/112 |
FGVC-Aircraft | 5/100 6/2 |
Stanford Cars | 2/76 3/120 |
表2 本模型512维的 ξ通道分配图
Table 2 ξvalue assignment using BARAN with 512 feature channels
Datasets | cnums/cgroups |
---|---|
CUB-200-2011 | 2/88 3/112 |
FGVC-Aircraft | 5/100 6/2 |
Stanford Cars | 2/76 3/120 |
Method | Base model | Params/106 | Accuracy/% |
---|---|---|---|
B-CNN[M,D] | VGG16-M+VGG16-D | 13.8 | 84.1 |
BARN[2×64d] | ResneXt29×2+SA | 34.8 | 84.8 |
BARN[4×64d] | ResneXt29×2+SA | 34.6 | 85.2 |
BARN[8×64d] | ResneXt29×2+SA | 34.4 | 85.5 |
BARN[32×4d] | ResneXt29×2+SA | 18.2 | 85.9 |
表3 嵌入SA模块的ResneXt在不同基数下的实验对比
Table 3 Experimental comparison between SA module and ResneXt under different cardinality
Method | Base model | Params/106 | Accuracy/% |
---|---|---|---|
B-CNN[M,D] | VGG16-M+VGG16-D | 13.8 | 84.1 |
BARN[2×64d] | ResneXt29×2+SA | 34.8 | 84.8 |
BARN[4×64d] | ResneXt29×2+SA | 34.6 | 85.2 |
BARN[8×64d] | ResneXt29×2+SA | 34.4 | 85.5 |
BARN[32×4d] | ResneXt29×2+SA | 18.2 | 85.9 |
Method | Base model | Accuracy/% | ||
---|---|---|---|---|
CUB-200-2011 | FGVC-Aircraft | Stanford Cars | ||
BARN+MCA (CWA) | ResneXt29×2+SA | 63.85 | 88.79 | 89.87 |
BARN+MCA | Resnext29×2+SA | 27.35 | 79.88 | 70.23 |
BARN+MCA | ResneXt29×2+SA | 65.07 | 88.28 | 90.04 |
BARN+MCA | ResneXt29×2+SA | 66.47 | 89.90 | 91.34 |
表4 MCA模块不同组件部分的消融实验对比
Table 4 Ablation experiment of different components of MCA module
Method | Base model | Accuracy/% | ||
---|---|---|---|---|
CUB-200-2011 | FGVC-Aircraft | Stanford Cars | ||
BARN+MCA (CWA) | ResneXt29×2+SA | 63.85 | 88.79 | 89.87 |
BARN+MCA | Resnext29×2+SA | 27.35 | 79.88 | 70.23 |
BARN+MCA | ResneXt29×2+SA | 65.07 | 88.28 | 90.04 |
BARN+MCA | ResneXt29×2+SA | 66.47 | 89.90 | 91.34 |
Method | Base model | Accuracy/% | ||
---|---|---|---|---|
CUB-200-2011 | FGVC-Aircraft | Stanford Cars | ||
B-CNN[ | VGG16 | 84.1 | 84.1 | 91.3 |
MaxEnt[ | B-CNN | 85.3 | 86.1 | 92.8 |
PC[ | B-CNN | 85.6 | 85.8 | 92.5 |
PC[ | DenseNet161 | 86.9 | 89.2 | 92.9 |
MA-CNN[ | VGG19 | 86.5 | 89.9 | 92.8 |
DFL-CNN[ | ResNet50 | 87.4 | 91.7 | 93.9 |
NTS-Net[ | ResNet50 | 87.5 | 91.4 | 93.9 |
TASN[ | ResNet50 | 87.9 | — | 93.8 |
DCL[ | VGG16 | 86.9 | 91.2 | 94.1 |
WPS-CPM[ | GoogleNet + ResNet50 | 90.4 | — | — |
Bi-Modal PMA[ | ResNet50 | 87.5 | 90.8 | 93.1 |
BARAN(Proposed) | B-CNN+ResneXt29 | 87.9 | 92.9 | 94.7 |
表5 不同弱监督细粒度图像分类方法实验对比
Table 5 Experimental comparison of different weakly supervised fine-grained image classification methods
Method | Base model | Accuracy/% | ||
---|---|---|---|---|
CUB-200-2011 | FGVC-Aircraft | Stanford Cars | ||
B-CNN[ | VGG16 | 84.1 | 84.1 | 91.3 |
MaxEnt[ | B-CNN | 85.3 | 86.1 | 92.8 |
PC[ | B-CNN | 85.6 | 85.8 | 92.5 |
PC[ | DenseNet161 | 86.9 | 89.2 | 92.9 |
MA-CNN[ | VGG19 | 86.5 | 89.9 | 92.8 |
DFL-CNN[ | ResNet50 | 87.4 | 91.7 | 93.9 |
NTS-Net[ | ResNet50 | 87.5 | 91.4 | 93.9 |
TASN[ | ResNet50 | 87.9 | — | 93.8 |
DCL[ | VGG16 | 86.9 | 91.2 | 94.1 |
WPS-CPM[ | GoogleNet + ResNet50 | 90.4 | — | — |
Bi-Modal PMA[ | ResNet50 | 87.5 | 90.8 | 93.1 |
BARAN(Proposed) | B-CNN+ResneXt29 | 87.9 | 92.9 | 94.7 |
[1] | ZHANG N, DONAHUE J, GIRSHICK R B, et al. Part-based R-CNNs for fine-grained category detection[C]// LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 834-849. |
[2] | 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报, 2017, 43(8):1306-1318. |
LUO J H, WU J X. A survey on fine-grained image cate-gorization using deep convolutional features[J]. Acta Auto-matica Sinica, 2017, 43(8):1306-1318. | |
[3] |
UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.
DOI URL |
[4] | LIN D, SHEN X Y, LU C W, et al. Deep LAC: deep localiza-tion, alignment and classification for fine-grained recognition[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1666-1674. |
[5] | YANG Z, LUO T G, WANG D, et al. Learning to navigate for fine-grained classification[C]// LNCS 11218: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 420-435. |
[6] |
BORJI A, ITTI L. State-of-the-art in visual attention modeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):185-207.
DOI URL |
[7] |
PENG Y H, HE X T, ZHAO J J. Object-part attention model for fine-grained image classification[J]. IEEE Transactions on Image Processing, 2018, 27(3):1487-1500.
DOI URL |
[8] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[9] | HAN K, GUO J Y, ZHANG C, et al. Attribute-aware attention model for fine-grained representation learning[C]// Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, Seoul, Oct 22-26, 2018. New York: ACM, 2018: 2040-2048. |
[10] | GAO Y, HAN X T, WANG X, et al. Channel interaction networks for fine-grained image categorization[C]// Procee-dings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 10818-10825. |
[11] |
HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):2011-2023.
DOI URL |
[12] | LI X, WANG W H, HU X L, et al. Selective kernel networks[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 510-519. |
[13] | ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: split-attention networks[J]. arXiv: 2004. 08955, 2020. |
[14] |
CHANG D L, DING Y F, XIE J Y, et al. The devil is in the channels: mutual-channel loss for fine-grained image classi-fication[J]. IEEE Transactions on Image Processing, 2020, 29:4683-4695.
DOI URL |
[15] | LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNNs for fine-grained visual recognition[J]. arXiv: 1504. 07889, 2015. |
[16] | XIE S N, GIRSHICK R B, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// Pro-ceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Wash-ington: IEEE Computer Society, 2017: 5987-5995. |
[17] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778. |
[18] | PASZKE A, GROSS S, CHINTALA S, et al. Automatic differentiation in PyTorch[C]// Proceedings of the 31st Con-ference on Neural Information Processing System, Long Beach, Oct 28, 2017. Red Hook: Curran Associates, 2017: 1-4. |
[19] | WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset[R]. Pasadena: California Ins-titute of Technology, 2011. |
[20] | MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[J]. arXiv: 1306. 5151, 2013. |
[21] | KRAUSE J, STARK M, DENG J, et al. 3D object represen-tations for fine-grained categorization[C]// Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 554-561. |
[22] | DUBEY A, GUPTA O, RASKAR R, et al. Maximum-entropy fine grained classification[C]// Proceedings of the Annual Con-ference on Neural Information Processing Systems, Montréal, Dec 3-8, 2018: 635-645. |
[23] | DUBEY A, GUPTA O, GUO P, et al. Pairwise confusion for fine-grained visual classification[C]// LNCS 11216: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 71-88. |
[24] | ZHENG H L, FU J L, MEI T, et al. Learning multi-attention convolutional neural network for fine-grained image reco-gnition[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 5219-5227. |
[25] | WANG Y M, MORARIU V I, DAVIS L S. Learning a disc-riminative filter bank within a CNN for fine-grained reco-gnition[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 4148-4157. |
[26] | ZHENG H L, FU J L, ZHA Z J, et al. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5012-5021. |
[27] | CHEN Y, BAI Y L, ZHANG W, et al. Destruction and con-struction learning for fine-grained image recognition[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5157-5166. |
[28] | GE W F, LIN X R, YU Y Z. Weakly supervised comple-mentary parts models for fine-grained image classification from the bottom up[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3034-3043. |
[29] |
SONG K T, WEI X S, SHU X B, et al. Bi-modal progres-sive mask attention for fine-grained recognition[J]. IEEE Transactions on Image Processing, 2020, 29:7006-7018.
DOI URL |
[30] | SELVARAJU R R, COGSWELL M, DAS A, et al. GRAD-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE Inter-national Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626. |
[31] | 杨萌林, 张文生. 分类激活图增强的图像分类算法[J]. 计算机科学与探索, 2020, 14(1):149-158. |
YANG M L, ZHANG W S. Image classification algorithm based on classification activation map enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(1):149-158. |
[1] | 李祥霞, 吉晓慧, 李彬. 细粒度图像分类的深度学习方法[J]. 计算机科学与探索, 2021, 15(10): 1830-1842. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||