计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 927-937.DOI: 10.3778/j.issn.1673-9418.2108087
收稿日期:
2021-07-22
修回日期:
2021-09-30
出版日期:
2022-04-01
发布日期:
2021-10-18
通讯作者:
+ E-mail: xie_linbo@jiangnan.edu.cn作者简介:
赵鹏飞(1996—),男,江苏盐城人,硕士研究生,主要研究方向为目标检测、深度学习。基金资助:
ZHAO Pengfei, XIE Linbo+(), PENG Li
Received:
2021-07-22
Revised:
2021-09-30
Online:
2022-04-01
Published:
2021-10-18
About author:
ZHAO Pengfei, born in 1996, M.S. candidate. His research interests include visual object detection and deep learning.Supported by:
摘要:
骨干网络特征提取不充分以及浅层卷积层缺乏语义信息等往往导致了对于小目标检测的效果不佳,为提高小目标检测的精确性与鲁棒性,提出一种融合注意力机制的深层次小目标检测算法。首先,针对骨干网络特征提取能力不足的问题,选用Darknet-53作为特征提取网络,通过构建新的分组残差连接来替换原Darknet-53中的残差连接结构,形成新的I-Darknet53骨干增强网络,该分组残差结构可通过交织不同通道的特征信息有效提高输出的感受野大小。其次,在多尺度检测阶段,提出浅层特征增强网络,采用特征增强模块与通道注意力机制引导下的高效特征融合策略对浅层与深层进行特征融合获得浅层增强特征,从而改善浅层语义特征信息不足的问题。实验结果表明,相较于SSD算法,所提算法在PASCAL VOC数据集上检测效果更加突出。当输入图像尺寸为300
中图分类号:
赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法[J]. 计算机科学与探索, 2022, 16(4): 927-937.
ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937.
算法 | 基础网络 | 训练集 | 测试集 | 输入尺寸 | GPU | mAP/% | 检测速度/(frame/s) |
---|---|---|---|---|---|---|---|
Faster R-CNN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 73.2 | 7.0 |
R-FCN[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 80.5 | 9.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 77.2 | 62.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 79.5 | 36.0 |
YOLOv2[ | Darknet-19 | VOC2007+VOC2012 | VOC2007 | 416×416 | Titan X | 76.8 | 67.0 |
YOLOv3[ | Darknet-53 | VOC2007+VOC2012 | VOC2007 | 416×416 | 1080Ti | 79.3 | 39.0 |
FSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 78.8 | 65.8 |
DSSD[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 321×321 | Titan X | 78.6 | 9.5 |
R-SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | Titan X | 78.5 | 35.0 |
BFSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 79.2 | 45.1 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 320×320 | 1080Ti | 80.3 | 32.4 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.2 | 18.9 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 80.2 | 48.0 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.3 | 32.0 |
表1 在VOC2007测试集上不同算法的对比
Table 1 Comparison of different algorithms on VOC2007 test set
算法 | 基础网络 | 训练集 | 测试集 | 输入尺寸 | GPU | mAP/% | 检测速度/(frame/s) |
---|---|---|---|---|---|---|---|
Faster R-CNN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 73.2 | 7.0 |
R-FCN[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 80.5 | 9.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 77.2 | 62.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 79.5 | 36.0 |
YOLOv2[ | Darknet-19 | VOC2007+VOC2012 | VOC2007 | 416×416 | Titan X | 76.8 | 67.0 |
YOLOv3[ | Darknet-53 | VOC2007+VOC2012 | VOC2007 | 416×416 | 1080Ti | 79.3 | 39.0 |
FSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 78.8 | 65.8 |
DSSD[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 321×321 | Titan X | 78.6 | 9.5 |
R-SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | Titan X | 78.5 | 35.0 |
BFSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 79.2 | 45.1 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 320×320 | 1080Ti | 80.3 | 32.4 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.2 | 18.9 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 80.2 | 48.0 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.3 | 32.0 |
Algorithm | mAP | bird | bottle | plant | chair | boat |
---|---|---|---|---|---|---|
Faster R-CNN[ | 55.9 | 70.9 | 52.1 | 38.8 | 52.0 | 65.5 |
R-FCN[ | 67.4 | 81.5 | 62.8 | 53.7 | 67.0 | 72.0 |
SSD[ | 61.7 | 76.0 | 50.5 | 52.3 | 60.3 | 69.6 |
YOLOv3[ | 66.2 | 78.6 | 57.8 | 56.5 | 66.3 | 71.9 |
DSSD[ | 63.1 | 80.5 | 53.9 | 51.7 | 61.1 | 68.4 |
BFSSD[ | 65.2 | 79.8 | 55.5 | 56.9 | 61.2 | 72.5 |
Ours300 | 68.5 | 80.7 | 59.7 | 58.4 | 68.2 | 75.6 |
Ours512 | 71.9 | 81.9 | 62.9 | 64.8 | 71.2 | 78.9 |
表2 在VOC2007数据集上的小目标检测结果对比
Table 2 Comparison of small object detection results on VOC2007 dataset %
Algorithm | mAP | bird | bottle | plant | chair | boat |
---|---|---|---|---|---|---|
Faster R-CNN[ | 55.9 | 70.9 | 52.1 | 38.8 | 52.0 | 65.5 |
R-FCN[ | 67.4 | 81.5 | 62.8 | 53.7 | 67.0 | 72.0 |
SSD[ | 61.7 | 76.0 | 50.5 | 52.3 | 60.3 | 69.6 |
YOLOv3[ | 66.2 | 78.6 | 57.8 | 56.5 | 66.3 | 71.9 |
DSSD[ | 63.1 | 80.5 | 53.9 | 51.7 | 61.1 | 68.4 |
BFSSD[ | 65.2 | 79.8 | 55.5 | 56.9 | 61.2 | 72.5 |
Ours300 | 68.5 | 80.7 | 59.7 | 58.4 | 68.2 | 75.6 |
Ours512 | 71.9 | 81.9 | 62.9 | 64.8 | 71.2 | 78.9 |
Algorithm | mAP/% | 检测速度(1080Ti)/(frame/s) | AP/% | |||
---|---|---|---|---|---|---|
airplane | ship | storage tank | tennis court | |||
Faster R-CNN[ | 72.4 | 11 | 74.3 | 78.7 | 71.9 | 64.5 |
R-FCN[ | 74.9 | 27 | 76.6 | 80.3 | 74.2 | 68.5 |
SSD[ | 76.5 | 62 | 79.5 | 81.9 | 75.2 | 69.4 |
YOLOv3[ | 80.9 | 66 | 86.2 | 85.7 | 77.3 | 74.6 |
DSSD[ | 78.9 | 13 | 81.9 | 84.9 | 78.4 | 70.5 |
R-SSD[ | 77.7 | 35 | 80.7 | 83.2 | 77.1 | 69.8 |
Ours | 89.9 | 48 | 90.8 | 90.1 | 90.5 | 88.4 |
表3 不同算法在HRRSD数据集上的结果对比
Table 3 Comparison of different algorithms on HRRSD dataset
Algorithm | mAP/% | 检测速度(1080Ti)/(frame/s) | AP/% | |||
---|---|---|---|---|---|---|
airplane | ship | storage tank | tennis court | |||
Faster R-CNN[ | 72.4 | 11 | 74.3 | 78.7 | 71.9 | 64.5 |
R-FCN[ | 74.9 | 27 | 76.6 | 80.3 | 74.2 | 68.5 |
SSD[ | 76.5 | 62 | 79.5 | 81.9 | 75.2 | 69.4 |
YOLOv3[ | 80.9 | 66 | 86.2 | 85.7 | 77.3 | 74.6 |
DSSD[ | 78.9 | 13 | 81.9 | 84.9 | 78.4 | 70.5 |
R-SSD[ | 77.7 | 35 | 80.7 | 83.2 | 77.1 | 69.8 |
Ours | 89.9 | 48 | 90.8 | 90.1 | 90.5 | 88.4 |
算法 | mAP/% |
---|---|
SSD | 77.2 |
SSD+Darknet-53 | 77.9 |
SSD+I-Darknet53 | 78.3 |
SSD+I-Darknet53+FEM | 78.6 |
SSD+I-Darknet53+FEM+Feature fusion | 79.3 |
SSD+I-Darknet53+FEM+Feature fusion+SE | 79.7 |
SSD+I-Darknet53+FEM+Feature fusion+CBAM | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=3) | 80.2 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=5) | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=7) | 79.8 |
表4 PASCAL VOC2007测试集模型简化测试
Table 4 Ablation studies on PASCAL VOC2007 test set
算法 | mAP/% |
---|---|
SSD | 77.2 |
SSD+Darknet-53 | 77.9 |
SSD+I-Darknet53 | 78.3 |
SSD+I-Darknet53+FEM | 78.6 |
SSD+I-Darknet53+FEM+Feature fusion | 79.3 |
SSD+I-Darknet53+FEM+Feature fusion+SE | 79.7 |
SSD+I-Darknet53+FEM+Feature fusion+CBAM | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=3) | 80.2 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=5) | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=7) | 79.8 |
[1] |
刘颖, 刘红燕, 范九伦, 等. 基于深度学习的小目标检测研究与应用综述[J]. 电子学报, 2020, 48(3):590-601.
DOI |
LIU Y, LIU H Y, FAN J L, et al. A survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(3):590-601. | |
[2] | 刘洋, 战荫伟. 基于深度学习的小目标检测算法综述[J]. 计算机工程与应用, 2021, 57(2):37-48. |
LIU Y, ZHAN Y W. Survey of small object detection algori-thms based on deep learning[J]. Computer Engineering and Applications, 2021, 57(2):37-48. | |
[3] | REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]// Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99. |
[4] | DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]// Proceedings of the 29th Annual Conference on Neural Information Process-ing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387. |
[5] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[6] | REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]// Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wash-ington: IEEE Computer Society, 2016: 779-788. |
[7] | REDMON J, FARHADI A. YOLO9000: better, faster, str-onger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525. |
[8] | REDMON J, FARHADI A. YOLOv3: an incremental im-provment[J]. arXiv: 1804. 02767, 2018. |
[9] | LI Z X, ZHOU F Q. FSSD: feature fusion single shot multi-box detector[J]. arXiv: 1712. 00960, 2017. |
[10] | LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]// LNCS 11215: Pro-ceedings of the 15th European Conference on Computer Vi-sion, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 404-419. |
[11] | 陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6):1049-1061. |
CHEN H J, WANG Q Q, YANG G W, et al. SSD object det-ection algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6):1049-1061. | |
[12] | 梁延禹, 李金宝. 多尺度非局部注意力网络的小目标检测算法[J]. 计算机科学与探索, 2020, 14(10):1744-1753. |
LIANG Y Y, LI J B. Small objects detection method based on multi-scale non-local attention network[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(10):1744-1753. | |
[13] | MISRA D. Mish: a self regularized non-monotonic neural activation function[J]. arXiv: 1908. 08681, 2019. |
[14] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141. |
[15] | WOO S, PARK J, LEE J Y, et al. CBAM: convolution block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[16] | WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]// Proceedings of the 2020 IEEE Conference on Computer Vi-sion and Pattern Recognition, Seattle, Jun 13-19, 2020. Pis-cataway: IEEE, 2020: 11531-11539. |
[17] | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE Intern-ational Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626. |
[18] | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. Inter-national Journal of Computer Vision, 2010, 88(2):303-338. |
[19] |
ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8):5535-5548.
DOI URL |
[20] | FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv: 1701. 06659, 2017. |
[21] | JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv: 1705. 09587, 2017. |
[22] |
ZHAO H, LI Z W, FANG L F, et al. A balanced feature fu-sion SSD for object detection[J]. Neural Processing Letters, 2020, 51(3):2789-2806.
DOI URL |
[23] |
WU X W, SAHOO D, ZHANG D X, et al. Single-shot bidir-ectional pyramid networks for high-quality object detection[J]. Neurocomputing, 2020, 401:1-9.
DOI URL |
[1] | 杨知桥, 张莹, 王新杰, 张东波, 王玉. 改进U型网络在视网膜病变检测中的应用研究[J]. 计算机科学与探索, 2022, 16(8): 1877-1884. |
[2] | 夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610. |
[3] | 彭豪, 李晓明. 多尺度选择金字塔网络的小样本目标检测算法[J]. 计算机科学与探索, 2022, 16(7): 1649-1660. |
[4] | 赵运基, 范存良, 张新良. 融合多特征和通道感知的目标跟踪算法[J]. 计算机科学与探索, 2022, 16(6): 1417-1428. |
[5] | 李运寰, 闻继伟, 彭力. 高帧率的轻量级孪生网络目标跟踪[J]. 计算机科学与探索, 2022, 16(6): 1405-1416. |
[6] | 张雁操, 赵宇海, 史岚. 融合图注意力的多特征链接预测算法[J]. 计算机科学与探索, 2022, 16(5): 1096-1106. |
[7] | 程卫月, 张雪琴, 林克正, 李骜. 融合全局与局部特征的深度卷积神经网络算法[J]. 计算机科学与探索, 2022, 16(5): 1146-1154. |
[8] | 王忠民, 赵玉鹏, 郑镕林, 贺炎, 张嘉雯, 刘洋. 脑电信号情绪识别研究综述[J]. 计算机科学与探索, 2022, 16(4): 760-774. |
[9] | 包广斌, 李港乐, 王国雄. 面向多模态情感分析的双模态交互注意力[J]. 计算机科学与探索, 2022, 16(4): 909-916. |
[10] | 陆仲达, 张春达, 张佳奇, 王子菲, 许军华. 双分支网络的苹果叶部病害识别[J]. 计算机科学与探索, 2022, 16(4): 917-926. |
[11] | 黄思远, 赵宇海, 梁燚铭. 融合图嵌入和注意力机制的代码搜索[J]. 计算机科学与探索, 2022, 16(4): 844-854. |
[12] | 王燕妮, 余丽仙. 注意力与多尺度有效融合的SSD目标检测算法[J]. 计算机科学与探索, 2022, 16(2): 438-447. |
[13] | 那峙雄, 樊涛, 孙涛, 谢祥颖, 来广志. 多损失融合的小样本光伏组件隐裂检测算法[J]. 计算机科学与探索, 2022, 16(2): 458-467. |
[14] | 李科岑, 王晓强, 林浩, 李雷孝, 杨艳艳, 孟闯, 高静. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索, 2022, 16(1): 41-58. |
[15] | 钱伍, 王国中, 李国平. 改进YOLOv5的交通灯实时检测鲁棒算法[J]. 计算机科学与探索, 2022, 16(1): 231-241. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||