计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (12): 2390-2400.DOI: 10.3778/j.issn.1673-9418.2011028

• 图形图像 • 上一篇    下一篇

多尺度通道注意力融合网络的小目标检测算法

李文涛,彭力   

  1. 物联网技术应用教育部工程研究中心(江南大学 物联网工程学院),江苏 无锡 214122
  • 出版日期:2021-12-01 发布日期:2021-12-10

Small Objects Detection Algorithm with Multi-scale Channel Attention Fusion Network

LI Wentao, PENG Li   

  1. Engineering Research Center of Internet of Things Technology Applications of the Ministry of Education, School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2021-12-01 Published:2021-12-10

摘要:

当前小目标检测算法的实现方式主要是设计各种特征融合模块,检测效果和模型复杂度很难达到平衡。此外,与常规目标相比,小目标信息量少,特征难以提取。为了克服这两个问题,采用了一种不降维局部跨通道交互策略的通道注意力模块,实现通道间的信息关联,通过对每个通道的特征进行权重分配来学习不同通道间特征的相关性。同时,加入改进的特征融合模块,使网络可以使用低层和高层的特征进行多尺度目标检测,提升了以低层特征为主要检测依据的小目标检测精度。骨干网络采用特征表达能力强和速度快的ResNet,在获取更多网络特征的同时保证了网络的收敛性。损失函数采用Focal Loss,减少易分类样本的权重,使得模型在训练时更关注于难分类样本的分类。该算法框架在VOC数据集上的mAP为82.7%,在航拍数据集上的mAP为86.8%。

关键词: 目标检测, 通道注意力, 卷积神经网络(CNN), 特征融合

Abstract:

The current implementation of small object detection algorithms is mainly to design various feature fusion modules. It is difficult to achieve a balance between the detection effect and the model complexity. In addition, compared with regular object, small object has less information and is difficult to extract features. To solve these two problems, a channel attention module is adopted to use a local cross-channels interaction strategy without dimensionality reduction. This module realizes the information association between channels and learns the correlation between features of different channels by considering the weight allocation of features of each channel. In addition, an improved feature fusion module is applied to integrating both the low-level and high-level features for multi-scale object detection. Through such a manner, the accuracy of small object detection is improved. The backbone network adopts ResNet with strong feature expression ability and fast speed, which ensures the convergence of the network while acquiring more network features. The loss function adopts Focal Loss to reduce the weight of easy-to-classify samples, making the model pay more attention to the classification of difficult-to-classify samples during training. The algorithm framework has the mAP of 82.7% on the VOC data set, 86.8% on the aerial photography data set.

Key words: object detection, channel attention, convolutional neural network (CNN), feature fusion