Small Objects Detection Algorithm with Multi-scale Channel Attention Fusion 
Network

doi:10.3778/j.issn.1673-9418.2011028

Abstract

Abstract:

The current implementation of small object detection algorithms is mainly to design various feature fusion modules. It is difficult to achieve a balance between the detection effect and the model complexity. In addition, compared with regular object, small object has less information and is difficult to extract features. To solve these two problems, a channel attention module is adopted to use a local cross-channels interaction strategy without dimensionality reduction. This module realizes the information association between channels and learns the correlation between features of different channels by considering the weight allocation of features of each channel. In addition, an improved feature fusion module is applied to integrating both the low-level and high-level features for multi-scale object detection. Through such a manner, the accuracy of small object detection is improved. The backbone network adopts ResNet with strong feature expression ability and fast speed, which ensures the convergence of the network while acquiring more network features. The loss function adopts Focal Loss to reduce the weight of easy-to-classify samples, making the model pay more attention to the classification of difficult-to-classify samples during training. The algorithm framework has the mAP of 82.7% on the VOC data set, 86.8% on the aerial photography data set.

Key words: object detection, channel attention, convolutional neural network (CNN), feature fusion

摘要：

当前小目标检测算法的实现方式主要是设计各种特征融合模块，检测效果和模型复杂度很难达到平衡。此外，与常规目标相比，小目标信息量少，特征难以提取。为了克服这两个问题，采用了一种不降维局部跨通道交互策略的通道注意力模块，实现通道间的信息关联，通过对每个通道的特征进行权重分配来学习不同通道间特征的相关性。同时，加入改进的特征融合模块，使网络可以使用低层和高层的特征进行多尺度目标检测，提升了以低层特征为主要检测依据的小目标检测精度。骨干网络采用特征表达能力强和速度快的ResNet，在获取更多网络特征的同时保证了网络的收敛性。损失函数采用Focal Loss，减少易分类样本的权重，使得模型在训练时更关注于难分类样本的分类。该算法框架在VOC数据集上的mAP为82.7%，在航拍数据集上的mAP为86.8%。

关键词: 目标检测, 通道注意力, 卷积神经网络（CNN）, 特征融合

LI Wentao, PENG Li. Small Objects Detection Algorithm with Multi-scale Channel Attention Fusion Network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2390-2400.

李文涛, 彭力. 多尺度通道注意力融合网络的小目标检测算法[J]. 计算机科学与探索, 2021, 15(12): 2390-2400.

References

[1] FANG L P, HE H J, ZHOU G M. Research overview of ob-ject detection methods[J]. Computer Engineering and Appli-cations, 2018, 54(13): 11-18.
方路平, 何杭江, 周国民. 目标检测算法研究综述[J]. 计算机工程与应用, 2018, 54(13): 11-18.
[2] CHEN H J, WANG Q Q, YANG G W, et al. SSD object det-ection algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6): 1049-1061.
陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6): 1049-1061.
[3] HUANG J, RATHOD V, SUN C, et al. Speed/accuracy trade-offs for modern convolutional object detectors[C]//Proceed-ings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington:IEEE Computer Society, 2017: 3296-3297.
[4] BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the 2016 IEEE Confer-ence on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2874-2883.
[5] PIETIK?INEN M. Texture analysis with local binary patterns[C]//LNCS 3540: Proceedings of the Scandinavian Confer-ence on Image Analysis, Norrk?ping, Jun 11-13, 2005. Berlin, Heidelberg: Springer, 2005: 115-118.
[6] DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387.
[7] REDMON J, FARHADI A. YOLO9000: better, faster, stron-ger[C]//Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[8] MITA T, KANEKO T, HORI O. Joint Haar-like features for face detection[C]//Proceedings of the 10th IEEE Interna-tional Conference on Computer Vision, Beijing, Oct 17-20, 2005. Washington: IEEE Computer Society, 2005: 1619-1626.
[9] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Com-puter Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893.
[10] GIRSHICK R B, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 142-158.
[11] GIRSHICK R B. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, San-tiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[12] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN:towards real-time object detection with region proposal net-works[C]//Proceedings of the 29th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[13] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[14] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[15] JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv: 1705.09587, 2017.
[16] LI Z X, ZHOU F Q. FSSD: feature fusion single shot multi-box detector[J]. arXiv:1712.00960, 2017.
[17] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 4278-4284.
[18] TSUNG-YI L, PRIYA G, ROSS G, et al. Focal loss for dense object detection[J]. arXiv:1708.02002, 2017.
[19] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]//LNCS 8689: Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 818-833.
[20] KONG T, SUN F C, HUA W B, et al. Deep feature pyramid reconfiguration for object detection[C]//LNCS 11209: Pro-ceedings of the 15th European Conference on Computer Vi-sion, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 172-188.
[21] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[22] GATES L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Rec-ognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2414-2423.
[23] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.
[24] ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8): 5535-5548.
[25] LU X, LIU K, CHENG Y X. Non-motor vehicle target de-tection based on deep learning[J]. Computer Engineering and Applications, 2019, 55(8): 182-188.
路雪, 刘坤, 程永翔. 一种深度学习的非机动车辆目标检测算法[J]. 计算机工程与应用, 2019, 55(8): 182-188.

Small Objects Detection Algorithm with Multi-scale Channel Attention Fusion Network

多尺度通道注意力融合网络的小目标检测算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles 0

Metrics

[1]	LI Zhixin, CHEN Shengjia, ZHOU Tao, MA Huifang. Combining Cascaded Network and Adversarial Network for Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 217-230.
[2]	QIAN Wu, WANG Guozhong, LI Guoping. Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 231-241.
[3]	LI Kecen, WANG Xiaoqiang, LIN Hao, LI Leixiao, YANG Yanyan, MENG Chuang, GAO Jing. Survey of One-Stage Small Object Detection Methods in Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 41-58.
[4]	WANG Dicong, BAI Chenshuai, WU Kaijun. Survey of Video Object Detection Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(9): 1563-1577.
[5]	CHEN Fan, PENG Li. Person Re-identification Based on Multi-level Feature Fusion with Overlapping Stripes [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(9): 1753-1761.
[6]	ZHANG Mengqian, ZHANG Li. Coarse-to-Fine Two-Stage Convolutional Neural Network Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1501-1510.
[7]	FANG Junting, TAN Xiaoyang. Defect Detection of Metal Surface Based on Attention Cascade R-CNN [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1245-1254.
[8]	NENG Wenpeng, LU Jun, ZHAO Caihong. Survey of Sleep Staging Based on Relational Induction Biases [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1026-1037.
[9]	ZHAO Xiaoqiang, XU Huiping. Image Semantic Segmentation Method with Hierarchical Feature Fusion [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(5): 949-957.
[10]	MA Dan, WAN Liang, CHENG Qiqin, SUN Zhiqiang. Research on Application of Attention-CNN in Malware Detection [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 670-681.
[11]	ZHANG Li, QIU Cunyue, ZHANG Kaixin, ZHANG Dabo, LUO Hao. Optimized Layered Convolutional Sub-health Recognition Algorithm of Improved Capsule Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 712-722.
[12]	XIAO Zhenjiu, YANG Xiaodi, WEI Xian, TANG Xiaoliang. Improved Lightweight Network in Image Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 743-753.
[13]	TAN Yaya, KONG Guangqian. Review of Research on Video Quality Assessment Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 423-437.
[14]	CHAI Enhui, MA Zhanfei, ZHI Min. Optimized Pedestrian Detection Algorithm for Norm-DP Model [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 545-552.
[15]	SHI Caijuan, ZHANG Weiming, CHEN Houru, GE Lulu. Survey of Salient Object Detection Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 219-232.