Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (4): 927-937.DOI: 10.3778/j.issn.1673-9418.2108087
• Graphics and Image • Previous Articles Next Articles
ZHAO Pengfei, XIE Linbo+(), PENG Li
Received:
2021-07-22
Revised:
2021-09-30
Online:
2022-04-01
Published:
2021-10-18
About author:
ZHAO Pengfei, born in 1996, M.S. candidate. His research interests include visual object detection and deep learning.Supported by:
通讯作者:
+ E-mail: xie_linbo@jiangnan.edu.cn作者简介:
赵鹏飞(1996—),男,江苏盐城人,硕士研究生,主要研究方向为目标检测、深度学习。基金资助:
CLC Number:
ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937.
赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法[J]. 计算机科学与探索, 2022, 16(4): 927-937.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2108087
算法 | 基础网络 | 训练集 | 测试集 | 输入尺寸 | GPU | mAP/% | 检测速度/(frame/s) |
---|---|---|---|---|---|---|---|
Faster R-CNN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 73.2 | 7.0 |
R-FCN[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 80.5 | 9.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 77.2 | 62.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 79.5 | 36.0 |
YOLOv2[ | Darknet-19 | VOC2007+VOC2012 | VOC2007 | 416×416 | Titan X | 76.8 | 67.0 |
YOLOv3[ | Darknet-53 | VOC2007+VOC2012 | VOC2007 | 416×416 | 1080Ti | 79.3 | 39.0 |
FSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 78.8 | 65.8 |
DSSD[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 321×321 | Titan X | 78.6 | 9.5 |
R-SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | Titan X | 78.5 | 35.0 |
BFSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 79.2 | 45.1 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 320×320 | 1080Ti | 80.3 | 32.4 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.2 | 18.9 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 80.2 | 48.0 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.3 | 32.0 |
Table 1 Comparison of different algorithms on VOC2007 test set
算法 | 基础网络 | 训练集 | 测试集 | 输入尺寸 | GPU | mAP/% | 检测速度/(frame/s) |
---|---|---|---|---|---|---|---|
Faster R-CNN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 73.2 | 7.0 |
R-FCN[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 600×1 000 | Titan X | 80.5 | 9.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 77.2 | 62.0 |
SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 79.5 | 36.0 |
YOLOv2[ | Darknet-19 | VOC2007+VOC2012 | VOC2007 | 416×416 | Titan X | 76.8 | 67.0 |
YOLOv3[ | Darknet-53 | VOC2007+VOC2012 | VOC2007 | 416×416 | 1080Ti | 79.3 | 39.0 |
FSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 78.8 | 65.8 |
DSSD[ | ResNet-101 | VOC2007+VOC2012 | VOC2007 | 321×321 | Titan X | 78.6 | 9.5 |
R-SSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | Titan X | 78.5 | 35.0 |
BFSSD[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 79.2 | 45.1 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 320×320 | 1080Ti | 80.3 | 32.4 |
BPN[ | VGG16 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.2 | 18.9 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 300×300 | 1080Ti | 80.2 | 48.0 |
Ours | I-Darknet53 | VOC2007+VOC2012 | VOC2007 | 512×512 | 1080Ti | 82.3 | 32.0 |
Algorithm | mAP | bird | bottle | plant | chair | boat |
---|---|---|---|---|---|---|
Faster R-CNN[ | 55.9 | 70.9 | 52.1 | 38.8 | 52.0 | 65.5 |
R-FCN[ | 67.4 | 81.5 | 62.8 | 53.7 | 67.0 | 72.0 |
SSD[ | 61.7 | 76.0 | 50.5 | 52.3 | 60.3 | 69.6 |
YOLOv3[ | 66.2 | 78.6 | 57.8 | 56.5 | 66.3 | 71.9 |
DSSD[ | 63.1 | 80.5 | 53.9 | 51.7 | 61.1 | 68.4 |
BFSSD[ | 65.2 | 79.8 | 55.5 | 56.9 | 61.2 | 72.5 |
Ours300 | 68.5 | 80.7 | 59.7 | 58.4 | 68.2 | 75.6 |
Ours512 | 71.9 | 81.9 | 62.9 | 64.8 | 71.2 | 78.9 |
Table 2 Comparison of small object detection results on VOC2007 dataset %
Algorithm | mAP | bird | bottle | plant | chair | boat |
---|---|---|---|---|---|---|
Faster R-CNN[ | 55.9 | 70.9 | 52.1 | 38.8 | 52.0 | 65.5 |
R-FCN[ | 67.4 | 81.5 | 62.8 | 53.7 | 67.0 | 72.0 |
SSD[ | 61.7 | 76.0 | 50.5 | 52.3 | 60.3 | 69.6 |
YOLOv3[ | 66.2 | 78.6 | 57.8 | 56.5 | 66.3 | 71.9 |
DSSD[ | 63.1 | 80.5 | 53.9 | 51.7 | 61.1 | 68.4 |
BFSSD[ | 65.2 | 79.8 | 55.5 | 56.9 | 61.2 | 72.5 |
Ours300 | 68.5 | 80.7 | 59.7 | 58.4 | 68.2 | 75.6 |
Ours512 | 71.9 | 81.9 | 62.9 | 64.8 | 71.2 | 78.9 |
Algorithm | mAP/% | 检测速度(1080Ti)/(frame/s) | AP/% | |||
---|---|---|---|---|---|---|
airplane | ship | storage tank | tennis court | |||
Faster R-CNN[ | 72.4 | 11 | 74.3 | 78.7 | 71.9 | 64.5 |
R-FCN[ | 74.9 | 27 | 76.6 | 80.3 | 74.2 | 68.5 |
SSD[ | 76.5 | 62 | 79.5 | 81.9 | 75.2 | 69.4 |
YOLOv3[ | 80.9 | 66 | 86.2 | 85.7 | 77.3 | 74.6 |
DSSD[ | 78.9 | 13 | 81.9 | 84.9 | 78.4 | 70.5 |
R-SSD[ | 77.7 | 35 | 80.7 | 83.2 | 77.1 | 69.8 |
Ours | 89.9 | 48 | 90.8 | 90.1 | 90.5 | 88.4 |
Table 3 Comparison of different algorithms on HRRSD dataset
Algorithm | mAP/% | 检测速度(1080Ti)/(frame/s) | AP/% | |||
---|---|---|---|---|---|---|
airplane | ship | storage tank | tennis court | |||
Faster R-CNN[ | 72.4 | 11 | 74.3 | 78.7 | 71.9 | 64.5 |
R-FCN[ | 74.9 | 27 | 76.6 | 80.3 | 74.2 | 68.5 |
SSD[ | 76.5 | 62 | 79.5 | 81.9 | 75.2 | 69.4 |
YOLOv3[ | 80.9 | 66 | 86.2 | 85.7 | 77.3 | 74.6 |
DSSD[ | 78.9 | 13 | 81.9 | 84.9 | 78.4 | 70.5 |
R-SSD[ | 77.7 | 35 | 80.7 | 83.2 | 77.1 | 69.8 |
Ours | 89.9 | 48 | 90.8 | 90.1 | 90.5 | 88.4 |
算法 | mAP/% |
---|---|
SSD | 77.2 |
SSD+Darknet-53 | 77.9 |
SSD+I-Darknet53 | 78.3 |
SSD+I-Darknet53+FEM | 78.6 |
SSD+I-Darknet53+FEM+Feature fusion | 79.3 |
SSD+I-Darknet53+FEM+Feature fusion+SE | 79.7 |
SSD+I-Darknet53+FEM+Feature fusion+CBAM | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=3) | 80.2 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=5) | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=7) | 79.8 |
Table 4 Ablation studies on PASCAL VOC2007 test set
算法 | mAP/% |
---|---|
SSD | 77.2 |
SSD+Darknet-53 | 77.9 |
SSD+I-Darknet53 | 78.3 |
SSD+I-Darknet53+FEM | 78.6 |
SSD+I-Darknet53+FEM+Feature fusion | 79.3 |
SSD+I-Darknet53+FEM+Feature fusion+SE | 79.7 |
SSD+I-Darknet53+FEM+Feature fusion+CBAM | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=3) | 80.2 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=5) | 79.9 |
SSD+I-Darknet53+FEM+Feature fusion+ECAM(K=7) | 79.8 |
[1] |
刘颖, 刘红燕, 范九伦, 等. 基于深度学习的小目标检测研究与应用综述[J]. 电子学报, 2020, 48(3):590-601.
DOI |
LIU Y, LIU H Y, FAN J L, et al. A survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(3):590-601. | |
[2] | 刘洋, 战荫伟. 基于深度学习的小目标检测算法综述[J]. 计算机工程与应用, 2021, 57(2):37-48. |
LIU Y, ZHAN Y W. Survey of small object detection algori-thms based on deep learning[J]. Computer Engineering and Applications, 2021, 57(2):37-48. | |
[3] | REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]// Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99. |
[4] | DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]// Proceedings of the 29th Annual Conference on Neural Information Process-ing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387. |
[5] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[6] | REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]// Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Wash-ington: IEEE Computer Society, 2016: 779-788. |
[7] | REDMON J, FARHADI A. YOLO9000: better, faster, str-onger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525. |
[8] | REDMON J, FARHADI A. YOLOv3: an incremental im-provment[J]. arXiv: 1804. 02767, 2018. |
[9] | LI Z X, ZHOU F Q. FSSD: feature fusion single shot multi-box detector[J]. arXiv: 1712. 00960, 2017. |
[10] | LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection[C]// LNCS 11215: Pro-ceedings of the 15th European Conference on Computer Vi-sion, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 404-419. |
[11] | 陈幻杰, 王琦琦, 杨国威, 等. 多尺度卷积特征融合的SSD目标检测算法[J]. 计算机科学与探索, 2019, 13(6):1049-1061. |
CHEN H J, WANG Q Q, YANG G W, et al. SSD object det-ection algorithm with multi-scale convolution feature fusion[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6):1049-1061. | |
[12] | 梁延禹, 李金宝. 多尺度非局部注意力网络的小目标检测算法[J]. 计算机科学与探索, 2020, 14(10):1744-1753. |
LIANG Y Y, LI J B. Small objects detection method based on multi-scale non-local attention network[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(10):1744-1753. | |
[13] | MISRA D. Mish: a self regularized non-monotonic neural activation function[J]. arXiv: 1908. 08681, 2019. |
[14] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141. |
[15] | WOO S, PARK J, LEE J Y, et al. CBAM: convolution block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[16] | WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]// Proceedings of the 2020 IEEE Conference on Computer Vi-sion and Pattern Recognition, Seattle, Jun 13-19, 2020. Pis-cataway: IEEE, 2020: 11531-11539. |
[17] | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE Intern-ational Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626. |
[18] | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. Inter-national Journal of Computer Vision, 2010, 88(2):303-338. |
[19] |
ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8):5535-5548.
DOI URL |
[20] | FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv: 1701. 06659, 2017. |
[21] | JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv: 1705. 09587, 2017. |
[22] |
ZHAO H, LI Z W, FANG L F, et al. A balanced feature fu-sion SSD for object detection[J]. Neural Processing Letters, 2020, 51(3):2789-2806.
DOI URL |
[23] |
WU X W, SAHOO D, ZHANG D X, et al. Single-shot bidir-ectional pyramid networks for high-quality object detection[J]. Neurocomputing, 2020, 401:1-9.
DOI URL |
[1] | YANG Zhiqiao, ZHANG Ying, WANG Xinjie, ZHANG Dongbo, WANG Yu. Application Research of Improved U-shaped Network in Detection of Retinopathy [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1877-1884. |
[2] | ZHANG Haocong, LI Tao, XING Lidong, PAN Fengrui. Parallel Implementation of OpenVX Feature Extraction Functions in Programmable Processing Architecture [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1583-1593. |
[3] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[4] | PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660. |
[5] | ZHAO Yunji, FAN Cunliang, ZHANG Xinliang. Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1417-1428. |
[6] | LI Yunhuan, WEN Jiwei, PENG Li. High Frame Rate Light-Weight Siamese Network Target Tracking [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1405-1416. |
[7] | ZHANG Yancao, ZHAO Yuhai, SHI Lan. Multi-feature Based Link Prediction Algorithm Fusing Graph Attention [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1096-1106. |
[8] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
[9] | WANG Zhongmin, ZHAO Yupeng, ZHENG Ronglin, HE Yan, ZHANG Jiawen, LIU Yang. Survey of Research on EEG Signal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 760-774. |
[10] | BAO Guangbin, LI Gangle, WANG Guoxiong. Bimodal Interactive Attention for Multimodal Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 909-916. |
[11] | LU Zhongda, ZHANG Chunda, ZHANG Jiaqi, WANG Zifei, XU Junhua. Identification of Apple Leaf Disease Based on Dual Branch Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 917-926. |
[12] | HUANG Siyuan, ZHAO Yuhai, LIANG Yiming. Code Search Combining Graph Embedding and Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 844-854. |
[13] | WANG Yanni, YU Lixian. SSD Object Detection Algorithm with Effective Fusion of Attention and Multi-scale [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 438-447. |
[14] | NA Zhixiong, FAN Tao, SUN Tao, XIE Xiangying, LAI Guangzhi. Micro-cracks Detection of Solar Cells Based on Few Shot Samples with Multi-loss [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 458-467. |
[15] | LI Kecen, WANG Xiaoqiang, LIN Hao, LI Leixiao, YANG Yanyan, MENG Chuang, GAO Jing. Survey of One-Stage Small Object Detection Methods in Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 41-58. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/