计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 938-949.DOI: 10.3778/j.issn.1673-9418.2010031

• 图形图像 • 上一篇    下一篇

双线性聚合残差注意力的细粒度图像分类模型

李宽宽, 刘立波+()   

  1. 宁夏大学 信息工程学院,银川 750021
  • 收稿日期:2020-10-12 修回日期:2021-01-07 出版日期:2022-04-01 发布日期:2021-02-04
  • 通讯作者: + E-mail: liulib@163.com
  • 作者简介:李宽宽(1995—),男,河北石家庄人,硕士研究生,主要研究方向为图像处理、计算机视觉。
    刘立波(1974—),女,宁夏银川人,博士,教授,硕士生导师,主要研究方向为图形图像与智能信息处理。
  • 基金资助:
    宁夏自然科学基金(2020AAC03031);国家自然科学基金(61862050);西部一流大学科研创新项目(ZKZD2017005)

Fine-Grained Image Classification Model Based on Bilinear Aggregate Residual Attention

LI Kuankuan, LIU Libo+()   

  1. School of Information Engineering, Ningxia University, Yinchuan 750021, China
  • Received:2020-10-12 Revised:2021-01-07 Online:2022-04-01 Published:2021-02-04
  • About author:LI Kuankuan, born in 1995, M.S. candidate. His research interests include image processing and computer vision.
    LIU Libo, born in 1974, Ph.D., professor, M.S. supervisor. Her research interests include graphics and intelligent information processing.
  • Supported by:
    Natural Science Foundation of Ningxia(2020AAC03031);National Natural Science Foundation of China(61862050);Scientific Research Innovation Project of First-Class Western Universities(ZKZD2017005)

摘要:

针对细粒度图像分类任务中种类间局部信息差异性较小,通常会导致模型表征能力不足,特征通道之间的相互依赖关系较差以及无法有效捕捉到显著且多样化的特征信息等问题,提出了一种双线性聚合残差注意力网络(BARAN)。首先在原双线性卷积网络模型(B-CNN)基础上,把原有特征提取子网络转变为更具学习能力的聚合残差网络,来提升网络的特征捕获能力;然后在每一聚合残差块内嵌入分散注意力模块,使得网络专注于整合跨维度特征,强化特征获取过程中通道之间的紧密关联程度;最终将融合的双线性特征图输入到互通道注意力模块中,利用互通道注意力模块包含的判别性与区分性两个子组件进一步学习到更加细微、多样化且互斥的局部类间易混淆信息。实验结果表明,该方法在CUB-200-2011、FGVC-Aircraft和Stanford Cars三个细粒度图像数据集上分类精度分别达到87.9%、92.9%、94.7%,性能优于大多数主流模型方法,并且相比原B-CNN模型提升幅度分别达到了0.038、0.088、0.034。

关键词: 细粒度图像分类, 聚合残差, 分散注意力, 互通道注意力, 多样化特征

Abstract:

Due to diversity in local information between categories is relatively subtle in fine-grained image classification tasks, it often causes problems such as insufficient ability of the model to capture discriminative features, and poor interdependence between channels when extracting features. As a result, the network cannot learn the salient and diverse image category features, which ultimately affects the classification performance. Therefore, this paper proposes a bilinear aggregate residual attention network (BARAN). In order to improve the feature capture ability of the network, firstly, based on the original bilinear convolutional neural networks model (B-CNN), the original feature extraction sub-network is transformed into a more learning aggregate residual network. And then, a distraction module is embedded in each aggregate residual block, so that the network focuses on integrating cross-dimensional features, and strengthens the degree of close association between channels in the feature acquisition process. Finally, the fused bilinear feature map is input into the cross-channel attention module, and the discriminative and distinctive sub-components included in the cross-channel attention module are used to further learn more subtle, diverse and mutually exclusive local inter-classes confusing information. Experimental results show that the classification accuracy on the fine-grained image datasets of CUB-200-2011, FGVC-Aircraft and Stanford Cars is 87.9%, 92.9% and 94.7%, which is superior to primary mainstream methods in classification performance. Moreover, the improvement is 0.038, 0.088 and 0.034 compared with the original B-CNN model.

Key words: fine-grained image classification, aggregate residuals, distracting attention, cross-channel attention, diversified feature

中图分类号: