Deep Small Object Detection Algorithm Integrating Attention Mechanism

doi:10.3778/j.issn.1673-9418.2108087

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (4): 927-937.DOI: 10.3778/j.issn.1673-9418.2108087

• Graphics and Image • Previous Articles Next Articles

Deep Small Object Detection Algorithm Integrating Attention Mechanism

ZHAO Pengfei, XIE Linbo⁺(), PENG Li

Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China

Received:2021-07-22 Revised:2021-09-30 Online:2022-04-01 Published:2021-10-18
About author:ZHAO Pengfei, born in 1996, M.S. candidate. His research interests include visual object detection and deep learning.
XIE Linbo, born in 1973, Ph.D., professor, Ph.D. supervisor, member of CAA. His research interests include process modeling and control, intelligent detection and system safety.
PENG Li, born in 1967, Ph.D., professor, Ph.D. supervisor, member of CAAI and CCF. His research interests include visual Internet of things and intelligent detection.
Supported by:
National Natural Science Foundation of China(61873112);National Key Research and Development Program of China(2018YFD0400902)

融合注意力机制的深层次小目标检测算法

赵鹏飞, 谢林柏⁺(), 彭力

物联网技术应用教育部工程研究中心（江南大学物联网工程学院）,江苏无锡 214122

通讯作者: + E-mail: xie_linbo@jiangnan.edu.cn
作者简介:赵鹏飞（1996—）,男,江苏盐城人,硕士研究生,主要研究方向为目标检测、深度学习。
谢林柏（1973—）,男,湖南永州人,博士,教授,博士生导师,CAA会员,主要研究方向为过程建模与控制、智能检测与系统安全性。
彭力（1967—）,男,河北唐山人,博士,教授,博士生导师,CAAI会员,CCF会员,主要研究方向为视觉物联网、智能检测。
基金资助:
国家自然科学基金(61873112);国家重点研发计划(2018YFD0400902)

Abstract

Abstract:

Insufficient feature extraction of the backbone network and lack of semantic information in the shallow convolution layer often lead to poor detection results on small objects. In order to improve the accuracy and robustness of small object detection, this paper proposes a deep small object detection algorithm that integrates attention mechanism. Firstly, to address the problem of insufficient feature extraction capability of the backbone network, Darknet-53 is selected as the network of feature extraction, and a new grouped residual connection is proposed to replace the residual connection structure in the original Darknet-53. This forms a new enhanced backbone network named I-Darknet53. This grouped residual structure can effectively increase the size of the receptive field by interweaving the feature information of different channels. Secondly, in the multi-scale detection phase, a shallow feature enhancement network is proposed to obtain shallow enhanced features by fusing the shallow layer and deep layer. The network including feature enhancement module and an efficient feature fusion strategy guided by channel attention mechanism is used to improve the lack of semantic information of shallow features. Experimental results show that the proposed algorithm has better performance than the SSD algorithm on PASCAL VOC dataset. When the input image size is 300 $\times$ 300, the average accuracy of the proposed model is 80.2%; when the input image size is 500 $\times$ 500, the average accuracy of the proposed model is 82.3%. In addition, it can effectively improve the detection accuracy of small objects under the premise of ensuring the detection speed.

Key words: small object detection, feature extraction, feature fusion, attention mechanism

摘要：

骨干网络特征提取不充分以及浅层卷积层缺乏语义信息等往往导致了对于小目标检测的效果不佳,为提高小目标检测的精确性与鲁棒性,提出一种融合注意力机制的深层次小目标检测算法。首先,针对骨干网络特征提取能力不足的问题,选用Darknet-53作为特征提取网络,通过构建新的分组残差连接来替换原Darknet-53中的残差连接结构,形成新的I-Darknet53骨干增强网络,该分组残差结构可通过交织不同通道的特征信息有效提高输出的感受野大小。其次,在多尺度检测阶段,提出浅层特征增强网络,采用特征增强模块与通道注意力机制引导下的高效特征融合策略对浅层与深层进行特征融合获得浅层增强特征,从而改善浅层语义特征信息不足的问题。实验结果表明,相较于SSD算法,所提算法在PASCAL VOC数据集上检测效果更加突出。当输入图像尺寸为300 $\times$ 300时,模型平均精度均值为80.2%;当输入图像尺寸为500 $\times$ 500时,模型平均精度均值为82.3%。并且在保证检测速度的前提下,增强了模型对小目标的检测效果。

关键词: 小目标检测, 特征提取, 特征融合, 注意力机制

CLC Number:

TP391.4

ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937.

赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法[J]. 计算机科学与探索, 2022, 16(4): 927-937.

Figures/Tables 17

References 23

[1]	刘颖, 刘红燕, 范九伦, 等. 基于深度学习的小目标检测研究与应用综述[J]. 电子学报, 2020, 48(3):590-601. DOI
	LIU Y, LIU H Y, FAN J L, et al. A survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(3):590-601.
[2]	刘洋, 战荫伟. 基于深度学习的小目标检测算法综述[J]. 计算机工程与应用, 2021, 57(2):37-48.
	LIU Y, ZHAN Y W. Survey of small object detection algori-thms based on deep learning[J]. Computer Engineering and Applications, 2021, 57(2):37-48.
[3]	REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]// Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[4]	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]// Proceedings of the 29th Annual Conference on Neural Information Process-ing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387.
[5]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[6]	REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]// Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wash-ington: IEEE Computer Society, 2016: 779-788.
[7]	REDMON J, FARHADI A. YOLO9000: better, faster, str-onger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[8]	REDMON J, FARHADI A. YOLOv3: an incremental im-provment[J]. arXiv: 1804. 02767, 2018.
[9]	LI Z X, ZHOU F Q. FSSD: feature fusion single shot multi-box detector[J]. arXiv: 1712. 00960, 2017.
[10]	LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]// LNCS 11215: Pro-ceedings of the 15th European Conference on Computer Vi-sion, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 404-419.
[11]	陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6):1049-1061.
	CHEN H J, WANG Q Q, YANG G W, et al. SSD object det-ection algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6):1049-1061.
[12]	梁延禹, 李金宝. 多尺度非局部注意力网络的小目标检测算法[J]. 计算机科学与探索, 2020, 14(10):1744-1753.
	LIANG Y Y, LI J B. Small objects detection method based on multi-scale non-local attention network[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(10):1744-1753.
[13]	MISRA D. Mish: a self regularized non-monotonic neural activation function[J]. arXiv: 1908. 08681, 2019.
[14]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[15]	WOO S, PARK J, LEE J Y, et al. CBAM: convolution block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[16]	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]// Proceedings of the 2020 IEEE Conference on Computer Vi-sion and Pattern Recognition, Seattle, Jun 13-19, 2020. Pis-cataway: IEEE, 2020: 11531-11539.
[17]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE Intern-ational Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626.
[18]	EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. Inter-national Journal of Computer Vision, 2010, 88(2):303-338.
[19]	ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8):5535-5548. DOI URL
[20]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv: 1701. 06659, 2017.
[21]	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv: 1705. 09587, 2017.
[22]	ZHAO H, LI Z W, FANG L F, et al. A balanced feature fu-sion SSD for object detection[J]. Neural Processing Letters, 2020, 51(3):2789-2806. DOI URL
[23]	WU X W, SAHOO D, ZHANG D X, et al. Single-shot bidir-ectional pyramid networks for high-quality object detection[J]. Neurocomputing, 2020, 401:1-9. DOI URL

算法	基础网络	训练集	测试集	输入尺寸	GPU	mAP/%	检测速度/(frame/s)
Faster R-CNN^[3]	VGG16	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	73.2	7.0
R-FCN^[4]	ResNet-101	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	80.5	9.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	77.2	62.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	79.5	36.0
YOLOv2^[7]	Darknet-19	VOC2007+VOC2012	VOC2007	416×416	Titan X	76.8	67.0
YOLOv3^[8]	Darknet-53	VOC2007+VOC2012	VOC2007	416×416	1080Ti	79.3	39.0
FSSD^[9]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	78.8	65.8
DSSD^[20]	ResNet-101	VOC2007+VOC2012	VOC2007	321×321	Titan X	78.6	9.5
R-SSD^[21]	VGG16	VOC2007+VOC2012	VOC2007	300×300	Titan X	78.5	35.0
BFSSD^[22]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	79.2	45.1
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	320×320	1080Ti	80.3	32.4
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.2	18.9
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	300×300	1080Ti	80.2	48.0
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.3	32.0

算法	基础网络	训练集	测试集	输入尺寸	GPU	mAP/%	检测速度/(frame/s)
Faster R-CNN^[3]	VGG16	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	73.2	7.0
R-FCN^[4]	ResNet-101	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	80.5	9.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	77.2	62.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	79.5	36.0
YOLOv2^[7]	Darknet-19	VOC2007+VOC2012	VOC2007	416×416	Titan X	76.8	67.0
YOLOv3^[8]	Darknet-53	VOC2007+VOC2012	VOC2007	416×416	1080Ti	79.3	39.0
FSSD^[9]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	78.8	65.8
DSSD^[20]	ResNet-101	VOC2007+VOC2012	VOC2007	321×321	Titan X	78.6	9.5
R-SSD^[21]	VGG16	VOC2007+VOC2012	VOC2007	300×300	Titan X	78.5	35.0
BFSSD^[22]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	79.2	45.1
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	320×320	1080Ti	80.3	32.4
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.2	18.9
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	300×300	1080Ti	80.2	48.0
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.3	32.0

Algorithm	mAP	bird	bottle	plant	chair	boat
Faster R-CNN^[3]	55.9	70.9	52.1	38.8	52.0	65.5
R-FCN^[4]	67.4	81.5	62.8	53.7	67.0	72.0
SSD^[5]	61.7	76.0	50.5	52.3	60.3	69.6
YOLOv3^[8]	66.2	78.6	57.8	56.5	66.3	71.9
DSSD^[20]	63.1	80.5	53.9	51.7	61.1	68.4
BFSSD^[22]	65.2	79.8	55.5	56.9	61.2	72.5
Ours300	68.5	80.7	59.7	58.4	68.2	75.6
Ours512	71.9	81.9	62.9	64.8	71.2	78.9

Algorithm	mAP	bird	bottle	plant	chair	boat
Faster R-CNN^[3]	55.9	70.9	52.1	38.8	52.0	65.5
R-FCN^[4]	67.4	81.5	62.8	53.7	67.0	72.0
SSD^[5]	61.7	76.0	50.5	52.3	60.3	69.6
YOLOv3^[8]	66.2	78.6	57.8	56.5	66.3	71.9
DSSD^[20]	63.1	80.5	53.9	51.7	61.1	68.4
BFSSD^[22]	65.2	79.8	55.5	56.9	61.2	72.5
Ours300	68.5	80.7	59.7	58.4	68.2	75.6
Ours512	71.9	81.9	62.9	64.8	71.2	78.9

Algorithm	mAP/%	检测速度(1080Ti)/(frame/s)	AP/%
Algorithm	mAP/%	检测速度(1080Ti)/(frame/s)	airplane	ship	storage tank	tennis court
Faster R-CNN^[3]	72.4	11	74.3	78.7	71.9	64.5
R-FCN^[4]	74.9	27	76.6	80.3	74.2	68.5
SSD^[5]	76.5	62	79.5	81.9	75.2	69.4
YOLOv3^[8]	80.9	66	86.2	85.7	77.3	74.6
DSSD^[20]	78.9	13	81.9	84.9	78.4	70.5
R-SSD^[21]	77.7	35	80.7	83.2	77.1	69.8
Ours	89.9	48	90.8	90.1	90.5	88.4

Deep Small Object Detection Algorithm Integrating Attention Mechanism

融合注意力机制的深层次小目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 23

Related Articles 15

Recommended Articles 0

Metrics

算法	mAP/%
SSD	77.2
SSD+Darknet-53	77.9
SSD+I-Darknet53	78.3
SSD+I-Darknet53+FEM	78.6
SSD+I-Darknet53+FEM+Feature fusion	79.3
SSD+I-Darknet53+FEM+Feature fusion+SE	79.7
SSD+I-Darknet53+FEM+Feature fusion+CBAM	79.9
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=3)	80.2
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=5)	79.9
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=7)	79.8

[1]	YANG Zhiqiao, ZHANG Ying, WANG Xinjie, ZHANG Dongbo, WANG Yu. Application Research of Improved U-shaped Network in Detection of Retinopathy [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1877-1884.
[2]	ZHANG Haocong, LI Tao, XING Lidong, PAN Fengrui. Parallel Implementation of OpenVX Feature Extraction Functions in Programmable Processing Architecture [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1583-1593.
[3]	XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610.
[4]	PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660.
[5]	ZHAO Yunji, FAN Cunliang, ZHANG Xinliang. Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1417-1428.
[6]	LI Yunhuan, WEN Jiwei, PENG Li. High Frame Rate Light-Weight Siamese Network Target Tracking [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1405-1416.
[7]	ZHANG Yancao, ZHAO Yuhai, SHI Lan. Multi-feature Based Link Prediction Algorithm Fusing Graph Attention [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1096-1106.
[8]	CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154.
[9]	WANG Zhongmin, ZHAO Yupeng, ZHENG Ronglin, HE Yan, ZHANG Jiawen, LIU Yang. Survey of Research on EEG Signal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 760-774.
[10]	BAO Guangbin, LI Gangle, WANG Guoxiong. Bimodal Interactive Attention for Multimodal Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 909-916.
[11]	LU Zhongda, ZHANG Chunda, ZHANG Jiaqi, WANG Zifei, XU Junhua. Identification of Apple Leaf Disease Based on Dual Branch Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 917-926.
[12]	HUANG Siyuan, ZHAO Yuhai, LIANG Yiming. Code Search Combining Graph Embedding and Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 844-854.
[13]	WANG Yanni, YU Lixian. SSD Object Detection Algorithm with Effective Fusion of Attention and Multi-scale [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 438-447.
[14]	NA Zhixiong, FAN Tao, SUN Tao, XIE Xiangying, LAI Guangzhi. Micro-cracks Detection of Solar Cells Based on Few Shot Samples with Multi-loss [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 458-467.
[15]	LI Kecen, WANG Xiaoqiang, LIN Hao, LI Leixiao, YANG Yanyan, MENG Chuang, GAO Jing. Survey of One-Stage Small Object Detection Methods in Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 41-58.