融合注意力机制的深层次小目标检测算法

doi:10.3778/j.issn.1673-9418.2108087

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 927-937.DOI: 10.3778/j.issn.1673-9418.2108087

融合注意力机制的深层次小目标检测算法

赵鹏飞, 谢林柏⁺(), 彭力

物联网技术应用教育部工程研究中心（江南大学物联网工程学院）,江苏无锡 214122

收稿日期:2021-07-22 修回日期:2021-09-30 出版日期:2022-04-01 发布日期:2021-10-18
通讯作者: + E-mail: xie_linbo@jiangnan.edu.cn
作者简介:赵鹏飞（1996—）,男,江苏盐城人,硕士研究生,主要研究方向为目标检测、深度学习。
谢林柏（1973—）,男,湖南永州人,博士,教授,博士生导师,CAA会员,主要研究方向为过程建模与控制、智能检测与系统安全性。
彭力（1967—）,男,河北唐山人,博士,教授,博士生导师,CAAI会员,CCF会员,主要研究方向为视觉物联网、智能检测。
基金资助:
国家自然科学基金(61873112);国家重点研发计划(2018YFD0400902)

Deep Small Object Detection Algorithm Integrating Attention Mechanism

ZHAO Pengfei, XIE Linbo⁺(), PENG Li

Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China

Received:2021-07-22 Revised:2021-09-30 Online:2022-04-01 Published:2021-10-18
About author:ZHAO Pengfei, born in 1996, M.S. candidate. His research interests include visual object detection and deep learning.
XIE Linbo, born in 1973, Ph.D., professor, Ph.D. supervisor, member of CAA. His research interests include process modeling and control, intelligent detection and system safety.
PENG Li, born in 1967, Ph.D., professor, Ph.D. supervisor, member of CAAI and CCF. His research interests include visual Internet of things and intelligent detection.
Supported by:
National Natural Science Foundation of China(61873112);National Key Research and Development Program of China(2018YFD0400902)

摘要/Abstract

摘要：

骨干网络特征提取不充分以及浅层卷积层缺乏语义信息等往往导致了对于小目标检测的效果不佳,为提高小目标检测的精确性与鲁棒性,提出一种融合注意力机制的深层次小目标检测算法。首先,针对骨干网络特征提取能力不足的问题,选用Darknet-53作为特征提取网络,通过构建新的分组残差连接来替换原Darknet-53中的残差连接结构,形成新的I-Darknet53骨干增强网络,该分组残差结构可通过交织不同通道的特征信息有效提高输出的感受野大小。其次,在多尺度检测阶段,提出浅层特征增强网络,采用特征增强模块与通道注意力机制引导下的高效特征融合策略对浅层与深层进行特征融合获得浅层增强特征,从而改善浅层语义特征信息不足的问题。实验结果表明,相较于SSD算法,所提算法在PASCAL VOC数据集上检测效果更加突出。当输入图像尺寸为300 $\times$ 300时,模型平均精度均值为80.2%;当输入图像尺寸为500 $\times$ 500时,模型平均精度均值为82.3%。并且在保证检测速度的前提下,增强了模型对小目标的检测效果。

关键词: 小目标检测, 特征提取, 特征融合, 注意力机制

Abstract:

Insufficient feature extraction of the backbone network and lack of semantic information in the shallow convolution layer often lead to poor detection results on small objects. In order to improve the accuracy and robustness of small object detection, this paper proposes a deep small object detection algorithm that integrates attention mechanism. Firstly, to address the problem of insufficient feature extraction capability of the backbone network, Darknet-53 is selected as the network of feature extraction, and a new grouped residual connection is proposed to replace the residual connection structure in the original Darknet-53. This forms a new enhanced backbone network named I-Darknet53. This grouped residual structure can effectively increase the size of the receptive field by interweaving the feature information of different channels. Secondly, in the multi-scale detection phase, a shallow feature enhancement network is proposed to obtain shallow enhanced features by fusing the shallow layer and deep layer. The network including feature enhancement module and an efficient feature fusion strategy guided by channel attention mechanism is used to improve the lack of semantic information of shallow features. Experimental results show that the proposed algorithm has better performance than the SSD algorithm on PASCAL VOC dataset. When the input image size is 300 $\times$ 300, the average accuracy of the proposed model is 80.2%; when the input image size is 500 $\times$ 500, the average accuracy of the proposed model is 82.3%. In addition, it can effectively improve the detection accuracy of small objects under the premise of ensuring the detection speed.

Key words: small object detection, feature extraction, feature fusion, attention mechanism

中图分类号:

TP391.4

赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法[J]. 计算机科学与探索, 2022, 16(4): 927-937.

ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937.

图/表 17

参考文献 23

[1]	刘颖, 刘红燕, 范九伦, 等. 基于深度学习的小目标检测研究与应用综述[J]. 电子学报, 2020, 48(3):590-601. DOI
	LIU Y, LIU H Y, FAN J L, et al. A survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(3):590-601.
[2]	刘洋, 战荫伟. 基于深度学习的小目标检测算法综述[J]. 计算机工程与应用, 2021, 57(2):37-48.
	LIU Y, ZHAN Y W. Survey of small object detection algori-thms based on deep learning[J]. Computer Engineering and Applications, 2021, 57(2):37-48.
[3]	REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]// Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[4]	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]// Proceedings of the 29th Annual Conference on Neural Information Process-ing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387.
[5]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[6]	REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]// Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wash-ington: IEEE Computer Society, 2016: 779-788.
[7]	REDMON J, FARHADI A. YOLO9000: better, faster, str-onger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[8]	REDMON J, FARHADI A. YOLOv3: an incremental im-provment[J]. arXiv: 1804. 02767, 2018.
[9]	LI Z X, ZHOU F Q. FSSD: feature fusion single shot multi-box detector[J]. arXiv: 1712. 00960, 2017.
[10]	LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]// LNCS 11215: Pro-ceedings of the 15th European Conference on Computer Vi-sion, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 404-419.
[11]	陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6):1049-1061.
	CHEN H J, WANG Q Q, YANG G W, et al. SSD object det-ection algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6):1049-1061.
[12]	梁延禹, 李金宝. 多尺度非局部注意力网络的小目标检测算法[J]. 计算机科学与探索, 2020, 14(10):1744-1753.
	LIANG Y Y, LI J B. Small objects detection method based on multi-scale non-local attention network[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(10):1744-1753.
[13]	MISRA D. Mish: a self regularized non-monotonic neural activation function[J]. arXiv: 1908. 08681, 2019.
[14]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[15]	WOO S, PARK J, LEE J Y, et al. CBAM: convolution block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[16]	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]// Proceedings of the 2020 IEEE Conference on Computer Vi-sion and Pattern Recognition, Seattle, Jun 13-19, 2020. Pis-cataway: IEEE, 2020: 11531-11539.
[17]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE Intern-ational Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626.
[18]	EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. Inter-national Journal of Computer Vision, 2010, 88(2):303-338.
[19]	ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8):5535-5548. DOI URL
[20]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv: 1701. 06659, 2017.
[21]	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv: 1705. 09587, 2017.
[22]	ZHAO H, LI Z W, FANG L F, et al. A balanced feature fu-sion SSD for object detection[J]. Neural Processing Letters, 2020, 51(3):2789-2806. DOI URL
[23]	WU X W, SAHOO D, ZHANG D X, et al. Single-shot bidir-ectional pyramid networks for high-quality object detection[J]. Neurocomputing, 2020, 401:1-9. DOI URL

编辑推荐 0

Metrics

阅读次数

全文

586

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	58	15	0	513

来源	本网站	其他网站

次数	561	25
比例	96%	4%

摘要

712

最新录用	在线预览	正式出版

38	0	674

来源	本网站	其他网站

次数	711	1
比例	100%	0%

算法	基础网络	训练集	测试集	输入尺寸	GPU	mAP/%	检测速度/(frame/s)
Faster R-CNN^[3]	VGG16	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	73.2	7.0
R-FCN^[4]	ResNet-101	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	80.5	9.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	77.2	62.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	79.5	36.0
YOLOv2^[7]	Darknet-19	VOC2007+VOC2012	VOC2007	416×416	Titan X	76.8	67.0
YOLOv3^[8]	Darknet-53	VOC2007+VOC2012	VOC2007	416×416	1080Ti	79.3	39.0
FSSD^[9]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	78.8	65.8
DSSD^[20]	ResNet-101	VOC2007+VOC2012	VOC2007	321×321	Titan X	78.6	9.5
R-SSD^[21]	VGG16	VOC2007+VOC2012	VOC2007	300×300	Titan X	78.5	35.0
BFSSD^[22]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	79.2	45.1
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	320×320	1080Ti	80.3	32.4
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.2	18.9
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	300×300	1080Ti	80.2	48.0
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.3	32.0

算法	基础网络	训练集	测试集	输入尺寸	GPU	mAP/%	检测速度/(frame/s)
Faster R-CNN^[3]	VGG16	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	73.2	7.0
R-FCN^[4]	ResNet-101	VOC2007+VOC2012	VOC2007	600×1 000	Titan X	80.5	9.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	77.2	62.0
SSD^[5]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	79.5	36.0
YOLOv2^[7]	Darknet-19	VOC2007+VOC2012	VOC2007	416×416	Titan X	76.8	67.0
YOLOv3^[8]	Darknet-53	VOC2007+VOC2012	VOC2007	416×416	1080Ti	79.3	39.0
FSSD^[9]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	78.8	65.8
DSSD^[20]	ResNet-101	VOC2007+VOC2012	VOC2007	321×321	Titan X	78.6	9.5
R-SSD^[21]	VGG16	VOC2007+VOC2012	VOC2007	300×300	Titan X	78.5	35.0
BFSSD^[22]	VGG16	VOC2007+VOC2012	VOC2007	300×300	1080Ti	79.2	45.1
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	320×320	1080Ti	80.3	32.4
BPN^[23]	VGG16	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.2	18.9
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	300×300	1080Ti	80.2	48.0
Ours	I-Darknet53	VOC2007+VOC2012	VOC2007	512×512	1080Ti	82.3	32.0

Algorithm	mAP	bird	bottle	plant	chair	boat
Faster R-CNN^[3]	55.9	70.9	52.1	38.8	52.0	65.5
R-FCN^[4]	67.4	81.5	62.8	53.7	67.0	72.0
SSD^[5]	61.7	76.0	50.5	52.3	60.3	69.6
YOLOv3^[8]	66.2	78.6	57.8	56.5	66.3	71.9
DSSD^[20]	63.1	80.5	53.9	51.7	61.1	68.4
BFSSD^[22]	65.2	79.8	55.5	56.9	61.2	72.5
Ours300	68.5	80.7	59.7	58.4	68.2	75.6
Ours512	71.9	81.9	62.9	64.8	71.2	78.9

Algorithm	mAP	bird	bottle	plant	chair	boat
Faster R-CNN^[3]	55.9	70.9	52.1	38.8	52.0	65.5
R-FCN^[4]	67.4	81.5	62.8	53.7	67.0	72.0
SSD^[5]	61.7	76.0	50.5	52.3	60.3	69.6
YOLOv3^[8]	66.2	78.6	57.8	56.5	66.3	71.9
DSSD^[20]	63.1	80.5	53.9	51.7	61.1	68.4
BFSSD^[22]	65.2	79.8	55.5	56.9	61.2	72.5
Ours300	68.5	80.7	59.7	58.4	68.2	75.6
Ours512	71.9	81.9	62.9	64.8	71.2	78.9

Algorithm	mAP/%	检测速度(1080Ti)/(frame/s)	AP/%
Algorithm	mAP/%	检测速度(1080Ti)/(frame/s)	airplane	ship	storage tank	tennis court
Faster R-CNN^[3]	72.4	11	74.3	78.7	71.9	64.5
R-FCN^[4]	74.9	27	76.6	80.3	74.2	68.5
SSD^[5]	76.5	62	79.5	81.9	75.2	69.4
YOLOv3^[8]	80.9	66	86.2	85.7	77.3	74.6
DSSD^[20]	78.9	13	81.9	84.9	78.4	70.5
R-SSD^[21]	77.7	35	80.7	83.2	77.1	69.8
Ours	89.9	48	90.8	90.1	90.5	88.4

融合注意力机制的深层次小目标检测算法

Deep Small Object Detection Algorithm Integrating Attention Mechanism

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 23

相关文章 15

编辑推荐 0

Metrics

算法	mAP/%
SSD	77.2
SSD+Darknet-53	77.9
SSD+I-Darknet53	78.3
SSD+I-Darknet53+FEM	78.6
SSD+I-Darknet53+FEM+Feature fusion	79.3
SSD+I-Darknet53+FEM+Feature fusion+SE	79.7
SSD+I-Darknet53+FEM+Feature fusion+CBAM	79.9
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=3)	80.2
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=5)	79.9
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=7)	79.8

[1]	杨知桥, 张莹, 王新杰, 张东波, 王玉. 改进U型网络在视网膜病变检测中的应用研究[J]. 计算机科学与探索, 2022, 16(8): 1877-1884.
[2]	夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610.
[3]	彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660.
[4]	赵运基, 范存良, 张新良. 融合多特征和通道感知的目标跟踪算法[J]. 计算机科学与探索, 2022, 16(6): 1417-1428.
[5]	李运寰, 闻继伟, 彭力. 高帧率的轻量级孪生网络目标跟踪[J]. 计算机科学与探索, 2022, 16(6): 1405-1416.
[6]	张雁操, 赵宇海, 史岚. 融合图注意力的多特征链接预测算法[J]. 计算机科学与探索, 2022, 16(5): 1096-1106.
[7]	程卫月, 张雪琴, 林克正, 李骜. 融合全局与局部特征的深度卷积神经网络算法[J]. 计算机科学与探索, 2022, 16(5): 1146-1154.
[8]	王忠民, 赵玉鹏, 郑镕林, 贺炎, 张嘉雯, 刘洋. 脑电信号情绪识别研究综述[J]. 计算机科学与探索, 2022, 16(4): 760-774.
[9]	包广斌, 李港乐, 王国雄. 面向多模态情感分析的双模态交互注意力[J]. 计算机科学与探索, 2022, 16(4): 909-916.
[10]	陆仲达, 张春达, 张佳奇, 王子菲, 许军华. 双分支网络的苹果叶部病害识别[J]. 计算机科学与探索, 2022, 16(4): 917-926.
[11]	黄思远, 赵宇海, 梁燚铭. 融合图嵌入和注意力机制的代码搜索[J]. 计算机科学与探索, 2022, 16(4): 844-854.
[12]	王燕妮, 余丽仙. 注意力与多尺度有效融合的SSD目标检测算法[J]. 计算机科学与探索, 2022, 16(2): 438-447.
[13]	那峙雄, 樊涛, 孙涛, 谢祥颖, 来广志. 多损失融合的小样本光伏组件隐裂检测算法[J]. 计算机科学与探索, 2022, 16(2): 458-467.
[14]	李科岑, 王晓强, 林浩, 李雷孝, 杨艳艳, 孟闯, 高静. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索, 2022, 16(1): 41-58.
[15]	钱伍, 王国中, 李国平. 改进YOLOv5的交通灯实时检测鲁棒算法[J]. 计算机科学与探索, 2022, 16(1): 231-241.