感受野下的小目标检测算法

doi:10.3778/j.issn.1673-9418.1912011

摘要/Abstract

摘要：

早前提出的one-stage类算法SSD，在主干网络特征提取方面，经过3×3的卷积之后会造成计算通道数增多。同时，在SSD中这些被提取出的特征直接生成特征图并分别丢入预测模型中，导致层与层之间没有很好的信息交融。在现实的检测过程中，因为神经网络的主导为大型目标，常常会忽略相对于大型物体更容易被漏检的小型物体，导致小型被检测物的检测成功率较低。因此基于SSD进行研究，融入了一种基于特征融合的感受野模型Receptive Field Block。在特征提取的主干网络上，基于感受视野特征提取融入特征融合模块，以加强对小目标的检测效果。该改进算法框架在VOC公开数据集上的mAP为81.8%，在自制的针对小目标的航拍数据集上的mAP为82.8%，在牺牲了部分速度的情况下，在精度方面产生了较大的优势。

关键词: 机器视觉, 特征融合, 感受野, 小目标, 深度学习

Abstract:

The one-stage algorithm SSD (single shot multibox detector) proposed earlier will increase the number of computation channels after 3×3 convolution in the feature extraction of the backbone network. At the same time, these extracted features are directly generated feature maps and respectively thrown into the prediction model, thus causing no good connection of information between layers. In the process of detection, the neural network dominates large targets. Small objects are more likely to be missed, which results in a lower detection rate for small objects. Based on SSD, this paper incorporates a receptive field block based on feature fusion. On the backbone network of feature extraction, the feature fusion module is extracted based on the perceptual visual field feature to enhance the detection effect on small targets. The mean average precision of the improved algorithm framework on the public data of VOC is 81.8%, and the mean average precision on the aerial dataset for the small target is 82.8%. At the expense of part of the speed, the precision has large advantage.

Key words: computer vision, feature fusion, receptive field, small target, deep learning

陈灏然, 彭力. 感受野下的小目标检测算法[J]. 计算机科学与探索, 2021, 15(2): 346-353.

CHEN Haoran, PENG Li. Detection Algorithm of Small Target in Receptive Field Block[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 346-353.

参考文献

[1] MITA T, KANEKO T, HORI O. Joint Haar-like features for face detection[C]//Proceedings of the 10th IEEE International Conference on Computer Vision, Beijing, Oct 17-20, 2005. Washington: IEEE Computer Society, 2005: 1619-1626.
[2] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Com-puter Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[3] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//LNCS 8689: Proceedings of the 2014 European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Berlin, Heidelberg: Springer, 2014: 834-849.
[4] GIRSHICK R B. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[5] LIN D, LU C, LIAO R, et al. Learning important spatial pooling regions for scene classification[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 24-27, 2014. Washington: IEEE Computer Society, 2014: 3726-3733.
[6] YAO K S, PENG B L, ZHANG Y, et al. Spoken language understanding using long short-term memory neural networks [C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 24-27, 2014. Piscataway: IEEE, 2014: 189-194.
[7] AHMAD A S, HASSAN M Y, ABDULLAH M P, et al. A review on applications of ANN and SVM for building elec-trical energy consumption forecasting[J]. Renewable and Sustainable Energy Reviews, 2014, 33: 102-109.
[8] REN S Q, HE K M, GIRSHICK R. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intellig-ence, 2015, 39(6): 1137-1149.
[9] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceed-ings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washin-gton: IEEE Computer Society, 2016: 779-788.
[10] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference, Amsterdam, Oct 11-14, 2016. Berlin,Heidelberg: Springer, 2016: 21-37.
[11] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[12] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 4278-4284.
[13] CAO G M, XIE X M, YANG W Z, et al. Feature-fused SSD: fast detection for small objects[C]//Proceedings of the 9th International Conference on Graphic and Image Processing, Qingdao, Oct 14-16, 2018. Piscataway: IEEE, 2018: 10615-10626.
[14] LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]//LNCS 11215: Proceedings of the 15th European Conference, Munich, Sep 8-14, 2018. Berlin, Heidelberg: Springer, 2018: 404-419.
[15] LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J]. arXiv:1712.00960, 2017.
[16] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceed-ings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washing-ton: IEEE Computer Society, 2016: 2818-2826.
[17] CHEN H J, WANG Q Q, YANG G W, et al. SSD object dete-ction algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6): 1049-1061.
陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6): 1049-1061.
[18] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applica-tions[J] arXiv:1704.04861, 2017.
[19] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. arXiv:1502.03167, 2015.
[20] ALY H A, DUBOIS E. Image up-sampling using total-varia-tion regularization with a new observation model[J]. IEEE Transactions on Image Processing, 2005, 14(10): 1647-1659.
[21] LU X, LIU K, CHENG Y X. Non-motor vehicle target detec-tion based on deep learning[J]. Computer Engineering and Applications, 2019, 55(8): 182-188.
路雪, 刘坤, 程永翔. 一种深度学习的非机动车辆目标检测算法[J]. 计算机工程与应用, 2019, 55(8): 182-188.
[22] ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8): 5535-5548.