引入残差学习与多尺度特征增强的目标检测器

doi:10.3778/j.issn.1673-9418.2109099

摘要/Abstract

摘要： 目前深度学习在计算机视觉领域中取得了巨大成功，但是小目标检测仍是目标检测领域中具有挑战性的难题。针对小物体分辨率低、图像模糊、携带信息少等问题，提出了引入残差学习与多尺度特征增强的目标检测器。首先在主干网络中引入基于残差学习的增强特征映射块，通过通道平均和归一化处理使得模型更加专注于对象区域而不是背景，并在兼顾检测速度的同时为有效特征层提供额外的语义信息；然后特征映射对上下文信息敏感的特征融合块进一步增大有效特征图的感受野，并将用于预测的浅特征层与深特征层进行融合，提高低分辨率下的检测性能；最后通过双重注意力块抑制背景噪音，将关键特征嵌入到注意力中，在保留空间信息的同时加强通道间的信息关联，进而增强特征的表达能力。为了更好地检测小目标，还对浅层特征映射先验框数量进行了调整。实验结果表明，在PASCAL VOC2007的数据集上，该算法对于300×300输入尺度的检测精度（mAP）为79.9%，较SSD提高了2.7个百分点，对小目标bird、bottle、chair、plant检测精度分别提升了5.1个百分点、7.5个百分点、3.9个百分点、7.2个百分点。在OAP自制航拍数据集上的检测精度（mAP）为82.7%。

关键词: 目标检测, 残差学习, 卷积神经网络（CNN）, 注意力机制

Abstract: At present, deep learning has achieved great success in the field of computer vision, but small object detection is still a challenging problem in the field of object detection. Aiming at the problems of low resolution of small objects, blurred images, and less information carried, one object detector that introduces residual learning and multi-scale feature enhancement is proposed. Firstly, an enhanced feature mapping block based on residual learning is introduced into the backbone network. Through channel averaging and normalization, the model more focuses on the object area instead of the background, and it provides additional semantics information for the effective feature layer while taking into account the detection speed. Then the feature map increases the receptive field of the effective feature map through feature fusion block sensitive to context information, and fuses the shallow feature layer and the deep feature layer used for prediction to improve the detection performance at low resolution. Finally, a dual attention block is used to suppress background noise, and key features are embedded in attention. While preserving spatial information, it strengthens the information association between channels, thereby enhancing the expressive ability of features. In order to better detect small objects, the number of a priori boxes for shallow feature mapping is also adjusted. Experimental results show that on the dataset of PASCAL VOC2007, the detection accuracy (mAP) of the algorithm for 300×300 input scale is 79.9%, which is 2.7 percentage points higher than that of SSD, and the detection accuracy of small objects bird, bottle, chair, and plant is improved 5.1 percentage points, 7.5 percentage points, 3.9 percentage points, 7.2 percentage points，respectively. The detection accuracy (mAP) on the OAP self-made aerial dataset is 82.7%.

Key words: object detection, residual learning, convolutional neural network (CNN), attention mechanism

贾天豪, 彭力, 戴菲菲. 引入残差学习与多尺度特征增强的目标检测器[J]. 计算机科学与探索, 2023, 17(5): 1102-1111.

JIA Tianhao, PENG Li, DAI Feifei. Object Detector with Residual Learning and Multi-scale Feature Enhancement[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1102-1111.

参考文献

[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25, Lake Tahoe, Dec 3-6, 2012: 1097-1105.
[2] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems 28, Montreal, Dec 7-12, 2015: 91-99.
[3] ZHAO J, GUO W, ZHANG Z, et al. A coupled convolutio-nal neural network for small and densely clustered ship de-tection in SAR images[J]. Science China Information Sciences, 2019, 62(4): 1-16.
[4] 许德刚, 王露, 李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
XU D G, WANG L, LI F. Review of typical object detec-tion algorithms for deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 10-25.
[5] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.
[6] WANG X Y, HAN T X, YAN S C. An HOG-LBP human detector with partial occlusion handling[C]//Proceedings of the IEEE 12th International Conference on Computer Vi-sion, Kyoto, Sep 27-Oct 4, 2009. Washington: IEEE Com-puter Society, 2009: 32-39.
[7] LIN T Y, DOLLáR P, GIRSHICK R B, et al. Feature pyra-mid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Re-cognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Com-puter Society, 2017: 936-944.
[8] KONG T, SUN F C, YAO A B, et al. RON: reverse connec-tion with objectness prior networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5244-5252.
[9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[10] LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J]. arXiv:1712.00960, 2017.
[11] SINGH B, DAVIS L S. An analysis of scale invariance in object detection SNIP[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, Jul 18-22, 2018. Washington: IEEE Computer So-ciety, 2018: 3578-3587.
[12] FU C Y, LIN W, RANGA A, et al. DSSD: deconvolutional single shot detector[C]//Proceedings of the 2017 IEEE Con-ference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2881-2890.
[13] ZHOU P, NI B, GENG C, et al. Scale-transferrable object detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, Jul 18-22, 2018. Washington: IEEE Computer Society, 2018: 528-537.
[14] 宋云博, 陈冬艳, 郝赟, 等. 基于级联卷积神经网络的高效目标检测方法[J]. 计算机工程与应用, 2021, 57(5): 139-145.
SONG Y B, CHEN D Y, HAO Y, et al. Efficient object detection method based on cascaded convolutional neural network[J]. Computer Engineering and Applications, 2021, 57(5): 139-145.
[15] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[16] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition, Las Ve-gas, Jun 27-30, 2016. Washington: IEEE Computer Society,2016: 770-778.
[17] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 4278-4284.
[18] 鞠默然, 罗江宁, 王仲博, 等. 融合注意力机制的多尺度目标检测算法[J]. 光学学报, 2020, 40(13): 132-140.
JU M R, LUO J N, WANG Z B, et al. Multi-scale target detection algorithm based on attention mechanism[J]. Acta Optica Sinica, 2020, 40(13): 132-140.
[19] WANG Q L, WU B B, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 11531-11539.
[20] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Com-puter Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[21] BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition, Las Ve-gas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2874-2883.
[22] FASTER R. Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Pro-cessing Systems 28, Montreal, Dec 7-12, 2015: 91-99.
[23] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[24] HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recog-nition, Jun 19-25, 2021. Washington: IEEE Computer So-ciety, 2021: 13713-13722.
[25] EVERINGHAM M, VAN G, WILLIAMS C, et al. The pas-cal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.
[26] JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J].arXiv:1705.09587, 2017.
[27] REDMON J, FARHADI A. YOLOV3: an incremental im-provement[J]. arXiv:1804.02767, 2018.
[28] DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016: 379-387.

编辑推荐 0

Metrics

阅读次数

全文

171

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	41	0	130

来源	本网站	其他网站

次数	161	10
比例	94%	6%

摘要

223

最新录用	在线预览	正式出版

63	0	160

	来源	本网站

	次数	223
	比例	100%