计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (5): 1102-1111.DOI: 10.3778/j.issn.1673-9418.2109099

• 图形·图像 • 上一篇    下一篇

引入残差学习与多尺度特征增强的目标检测器

贾天豪,彭力,戴菲菲   

  1. 1. 物联网技术应用教育部工程研究中心(江南大学 物联网工程学院),江苏 无锡 214122
    2. 台州市质量安全检测研究院,浙江 台州 318020
  • 出版日期:2023-05-01 发布日期:2023-05-01

Object Detector with Residual Learning and Multi-scale Feature Enhancement

JIA Tianhao, PENG Li, DAI Feifei   

  1. 1. Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China
    2. Taizhou Institute of Quality and Safety Testing, Taizhou, Zhejiang 318020, China
  • Online:2023-05-01 Published:2023-05-01

摘要: 目前深度学习在计算机视觉领域中取得了巨大成功,但是小目标检测仍是目标检测领域中具有挑战性的难题。针对小物体分辨率低、图像模糊、携带信息少等问题,提出了引入残差学习与多尺度特征增强的目标检测器。首先在主干网络中引入基于残差学习的增强特征映射块,通过通道平均和归一化处理使得模型更加专注于对象区域而不是背景,并在兼顾检测速度的同时为有效特征层提供额外的语义信息;然后特征映射对上下文信息敏感的特征融合块进一步增大有效特征图的感受野,并将用于预测的浅特征层与深特征层进行融合,提高低分辨率下的检测性能;最后通过双重注意力块抑制背景噪音,将关键特征嵌入到注意力中,在保留空间信息的同时加强通道间的信息关联,进而增强特征的表达能力。为了更好地检测小目标,还对浅层特征映射先验框数量进行了调整。实验结果表明,在PASCAL VOC2007的数据集上,该算法对于300×300输入尺度的检测精度(mAP)为79.9%,较SSD提高了2.7个百分点,对小目标bird、bottle、chair、plant检测精度分别提升了5.1个百分点、7.5个百分点、3.9个百分点、7.2个百分点。在OAP自制航拍数据集上的检测精度(mAP)为82.7%。

关键词: 目标检测, 残差学习, 卷积神经网络(CNN), 注意力机制

Abstract: At present, deep learning has achieved great success in the field of computer vision, but small object detection is still a challenging problem in the field of object detection. Aiming at the problems of low resolution of small objects, blurred images, and less information carried, one object detector that introduces residual learning and multi-scale feature enhancement is proposed. Firstly, an enhanced feature mapping block based on residual learning is introduced into the backbone network. Through channel averaging and normalization, the model more focuses on the object area instead of the background, and it provides additional semantics information for the effective feature layer while taking into account the detection speed. Then the feature map increases the receptive field of the effective feature map through feature fusion block sensitive to context information, and fuses the shallow feature layer and the deep feature layer used for prediction to improve the detection performance at low resolution. Finally, a dual attention block is used to suppress background noise, and key features are embedded in attention. While preserving spatial information, it strengthens the information association between channels, thereby enhancing the expressive ability of features. In order to better detect small objects, the number of a priori boxes for shallow feature mapping is also adjusted. Experimental results show that on the dataset of PASCAL VOC2007, the detection accuracy (mAP) of the algorithm for 300×300 input scale is 79.9%, which is 2.7 percentage points higher than that of SSD, and the detection accuracy of small objects bird, bottle, chair, and plant is improved 5.1 percentage points, 7.5 percentage points, 3.9 percentage points, 7.2 percentage points,respectively. The detection accuracy (mAP) on the OAP self-made aerial dataset is 82.7%.

Key words: object detection, residual learning, convolutional neural network (CNN), attention mechanism