Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (11): 2575-2586.DOI: 10.3778/j.issn.1673-9418.2102001
• Graphics and Image • Previous Articles Next Articles
LI Qingyuan1, DENG Zhaohong1,2,3,+(), LUO Xiaoqing1, GU Xin4, WANG Shitong1
Received:
2021-02-01
Revised:
2021-03-18
Online:
2022-11-01
Published:
2021-03-25
About author:
LI Qingyuan, born in 1997, M.S. candidate. His research interest is deep learning.Supported by:
李青援1, 邓赵红1,2,3,+(), 罗晓清1, 顾鑫4, 王士同1
通讯作者:
+ E-mail: dengzhaohong@jiangnan.edu.cn作者简介:
李青援(1997—),男,山东潍坊人,硕士研究生,主要研究方向为深度学习。基金资助:
CLC Number:
LI Qingyuan, DENG Zhaohong, LUO Xiaoqing, GU Xin, WANG Shitong. SSD Object Detection Algorithm with Attention and Cross-Scale Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2575-2586.
李青援, 邓赵红, 罗晓清, 顾鑫, 王士同. 注意力与跨尺度融合的SSD目标检测算法[J]. 计算机科学与探索, 2022, 16(11): 2575-2586.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2102001
模型 | mAP | aero | bike | bird | boat | bottle | bus | car | cat | chair | cow | table | dog | horse | mbike | person | plant | sheep | sofa | train | tv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SSD | 77.5 | 79.5 | 83.9 | 76.0 | 69.6 | 50.5 | 87.0 | 85.7 | 88.1 | 60.3 | 81.5 | 77.0 | 86.1 | 87.5 | 84.0 | 79.4 | 51.7 | 77.9 | 79.5 | 87.6 | 76.8 |
DSSD | 78.6 | 81.9 | 84.9 | 80.5 | 68.4 | 53.9 | 85.6 | 86.2 | 88.9 | 61.1 | 83.5 | 78.7 | 86.7 | 88.7 | 86.7 | 79.7 | 51.7 | 78.0 | 80.9 | 87.2 | 79.4 |
ION | 79.2 | 80.2 | 85.2 | 78.8 | 70.9 | 62.6 | 86.6 | 86.9 | 89.8 | 61.7 | 86.9 | 76.5 | 88.4 | 87.5 | 83.4 | 80.5 | 52.4 | 78.1 | 77.2 | 86.9 | 83.5 |
Ours | 80.4 | 84.9 | 87.0 | 79.5 | 75.5 | 59.5 | 86.7 | 87.4 | 89.0 | 67.5 | 85.0 | 80.0 | 86.6 | 88.0 | 86.2 | 81.9 | 57.1 | 79.1 | 81.1 | 86.3 | 79.7 |
Table 1
模型 | mAP | aero | bike | bird | boat | bottle | bus | car | cat | chair | cow | table | dog | horse | mbike | person | plant | sheep | sofa | train | tv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SSD | 77.5 | 79.5 | 83.9 | 76.0 | 69.6 | 50.5 | 87.0 | 85.7 | 88.1 | 60.3 | 81.5 | 77.0 | 86.1 | 87.5 | 84.0 | 79.4 | 51.7 | 77.9 | 79.5 | 87.6 | 76.8 |
DSSD | 78.6 | 81.9 | 84.9 | 80.5 | 68.4 | 53.9 | 85.6 | 86.2 | 88.9 | 61.1 | 83.5 | 78.7 | 86.7 | 88.7 | 86.7 | 79.7 | 51.7 | 78.0 | 80.9 | 87.2 | 79.4 |
ION | 79.2 | 80.2 | 85.2 | 78.8 | 70.9 | 62.6 | 86.6 | 86.9 | 89.8 | 61.7 | 86.9 | 76.5 | 88.4 | 87.5 | 83.4 | 80.5 | 52.4 | 78.1 | 77.2 | 86.9 | 83.5 |
Ours | 80.4 | 84.9 | 87.0 | 79.5 | 75.5 | 59.5 | 86.7 | 87.4 | 89.0 | 67.5 | 85.0 | 80.0 | 86.6 | 88.0 | 86.2 | 81.9 | 57.1 | 79.1 | 81.1 | 86.3 | 79.7 |
模型 | mAP | aero | bike | bird | boat | bottle | bus | car | cat | chair | cow | table | dog | horse | mbike | person | plant | sheep | sofa | train | tv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SSD512 | 76.7 | 88.8 | 84.8 | 77.0 | 61.0 | 56.3 | 82.6 | 82.4 | 92.6 | 58.4 | 80.7 | 61.4 | 90.4 | 87.2 | 86.9 | 85.0 | 53.1 | 81.2 | 65.9 | 86.4 | 72.0 |
Ours512 | 78.5 | 91.0 | 87.9 | 79.8 | 63.6 | 60.3 | 84.6 | 83.5 | 92.8 | 60.9 | 82.2 | 64.3 | 91.2 | 86.8 | 88.3 | 87.1 | 57.1 | 85.1 | 66.1 | 84.0 | 73.8 |
Table 2
模型 | mAP | aero | bike | bird | boat | bottle | bus | car | cat | chair | cow | table | dog | horse | mbike | person | plant | sheep | sofa | train | tv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SSD512 | 76.7 | 88.8 | 84.8 | 77.0 | 61.0 | 56.3 | 82.6 | 82.4 | 92.6 | 58.4 | 80.7 | 61.4 | 90.4 | 87.2 | 86.9 | 85.0 | 53.1 | 81.2 | 65.9 | 86.4 | 72.0 |
Ours512 | 78.5 | 91.0 | 87.9 | 79.8 | 63.6 | 60.3 | 84.6 | 83.5 | 92.8 | 60.9 | 82.2 | 64.3 | 91.2 | 86.8 | 88.3 | 87.1 | 57.1 | 85.1 | 66.1 | 84.0 | 73.8 |
算法 | 网络 | 检测速度/(frame/s) | GPU | 锚框个数 | 输入尺寸 | mAP/% |
---|---|---|---|---|---|---|
Faster R-CNN[ | VGG-16 | 7.0 | Tian X | 6 000 | 73.2 | |
Faster R-CNN[ | ResNet-101 | 2.4 | K40 | 300 | 76.4 | |
R-FCN[ | ResNet-50 | — | — | 300 | 77.0 | |
R-FCN[ | ResNet-101 | 5.8 | K40 | 300 | 79.5 | |
YOLOv2[ | Darknet-19 | 81.0 | Tian X | — | 73.7 | |
SSD300[ | VGG-16 | 92.0 | 2080Ti | 8 732 | 77.5 | |
FSSD300[ | VGG-16 | 65.8 | 1080Ti | 8 732 | 78.8 | |
RefineDet320[ | VGG-16 | 12.9 | K80 | 6 375 | 79.5 | |
RSSD300[ | VGG-16 | 35.0 | Tian X | 8 732 | 78.5 | |
DSSD321[ | ResNet-101 | 9.5 | Tian X | 17 080 | 78.6 | |
ASSD300[ | VGG-16 | 11.8 | K40 | 8 732 | 80.0 | |
SSD512[ | VGG-16 | 45.0 | 2080Ti | 24 564 | 79.5 | |
DSSD513[ | ResNet-101 | 5.5 | Tian X | 43 688 | 81.5 | |
FSSD512[ | VGG-16 | 35.7 | 1080Ti | 24 564 | 80.9 | |
RSSD512[ | VGG-16 | 16.6 | Tian X | 24 564 | 80.8 | |
ASSD512 | VGG-16 | 3.4 | K40 | 24 564 | 81.6 | |
RefineDet512[ | VGG-16 | 5.6 | K80 | 16 320 | 81.2 | |
Ours300 | VGG-16 | 44.8 | 2080Ti | 8 732 | 80.4 | |
Ours512 | VGG-16 | 22.5 | 2080Ti | 24 564 | 82.2 |
Table 3 Comparison of detection speed and accuracy on PASCAL VOC2007test dataset
算法 | 网络 | 检测速度/(frame/s) | GPU | 锚框个数 | 输入尺寸 | mAP/% |
---|---|---|---|---|---|---|
Faster R-CNN[ | VGG-16 | 7.0 | Tian X | 6 000 | 73.2 | |
Faster R-CNN[ | ResNet-101 | 2.4 | K40 | 300 | 76.4 | |
R-FCN[ | ResNet-50 | — | — | 300 | 77.0 | |
R-FCN[ | ResNet-101 | 5.8 | K40 | 300 | 79.5 | |
YOLOv2[ | Darknet-19 | 81.0 | Tian X | — | 73.7 | |
SSD300[ | VGG-16 | 92.0 | 2080Ti | 8 732 | 77.5 | |
FSSD300[ | VGG-16 | 65.8 | 1080Ti | 8 732 | 78.8 | |
RefineDet320[ | VGG-16 | 12.9 | K80 | 6 375 | 79.5 | |
RSSD300[ | VGG-16 | 35.0 | Tian X | 8 732 | 78.5 | |
DSSD321[ | ResNet-101 | 9.5 | Tian X | 17 080 | 78.6 | |
ASSD300[ | VGG-16 | 11.8 | K40 | 8 732 | 80.0 | |
SSD512[ | VGG-16 | 45.0 | 2080Ti | 24 564 | 79.5 | |
DSSD513[ | ResNet-101 | 5.5 | Tian X | 43 688 | 81.5 | |
FSSD512[ | VGG-16 | 35.7 | 1080Ti | 24 564 | 80.9 | |
RSSD512[ | VGG-16 | 16.6 | Tian X | 24 564 | 80.8 | |
ASSD512 | VGG-16 | 3.4 | K40 | 24 564 | 81.6 | |
RefineDet512[ | VGG-16 | 5.6 | K80 | 16 320 | 81.2 | |
Ours300 | VGG-16 | 44.8 | 2080Ti | 8 732 | 80.4 | |
Ours512 | VGG-16 | 22.5 | 2080Ti | 24 564 | 82.2 |
方法 | 检测速度/(frame/s) | mAP/% |
---|---|---|
SSD | 92.0 | 77.5 |
SSD* | 77.0 | 78.1 |
SSD*+EM | 69.3 | 78.5 |
SSD*+EM+FM | 46.7 | 79.7 |
SSD*+EM+FM+NCA | 44.8 | 80.4 |
Table 4 Comparative results of ablation experiments
方法 | 检测速度/(frame/s) | mAP/% |
---|---|---|
SSD | 92.0 | 77.5 |
SSD* | 77.0 | 78.1 |
SSD*+EM | 69.3 | 78.5 |
SSD*+EM+FM | 46.7 | 79.7 |
SSD*+EM+FM+NCA | 44.8 | 80.4 |
[1] | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580- 587. |
[2] | GIRSHICK R. Fast R-CNN[J]. arXiv:1504.08083, 2015. |
[3] |
REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
DOI PMID |
[4] | RAJARAM R N, OHN-BAR E, TRIVEDI M M. RefineNet: iterative refinement for accurate object localization[C]// Proceedings of the 19th IEEE International Conference on Intelligent Transportation Systems, Rio de Janeiro, Nov 1-4, 2016. Piscataway: IEEE, 2016: 1528-1533. |
[5] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788. |
[6] | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525. |
[7] | REDMON J, FARHADI A. YOLOv3: an incremental improve-ment[J]. arXiv:1804.02767, 2018. |
[8] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37. |
[9] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2999-3007. |
[10] |
FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D A, et al. Object detection with discriminatively trained part- based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.
DOI URL |
[11] | LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944. |
[12] | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 8759-8768. |
[13] | ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6230-6239. |
[14] |
EVERINGHAM M, ESLAMI S, GOOL L, et al. The PASCAL visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2014, 111(1): 98-136.
DOI URL |
[15] |
LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
DOI URL |
[16] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, San Diego, Jun 20-26, 2005. Washington: IEEE Computer Society, 2005: 886-893. |
[17] | FU CY, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017. |
[18] | LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J]. arXiv:1712.00960, 2017. |
[19] | SHEN Z, LIU Z, LI J, et al. DSOD: learning deeply supervised object detectors from scratch[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1937-1945. |
[20] | HUANG G, LIU Z, WEINBERGER K Q. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2261-2269. |
[21] | PANG J, CHEN K, SHI J, et al. Libra R-CNN: towards balanced learning for object detection[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 821-830. |
[22] | BELL S, ZITNICK C L, BALA K, et al. Inside-Outside net: detecting objects in context with skip pooling and recurrent neural networks[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2874-2883. |
[23] | KONG T, YAO A, CHEN Y, et al. HyperNet: towards accurate region proposal generation and joint object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 845-853. |
[24] | HARIHARAN B, ARBELÁEZ P P, GIRSHICK R B, et al. Hypercolumns for object segmentation and fine-grained localization[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 447-456. |
[25] |
HU J, SHEN L, ALBANIE S, et al. Squeeze-and-Excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023.
DOI PMID |
[26] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[27] | WANG X, GIRSHICK R B, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7794-7803. |
[28] | CAO Y, XU J, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Oct 27-28, 2019. Piscataway: IEEE, 2019: 1971-1980. |
[29] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014. |
[30] | ZHOU B, KHOSLA A, LAPEDRIZA À, et al. Object detectors emerge in deep scene CNNs[J]. arXiv:1412.6856, 2014. |
[31] | DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[J]. arXiv:1605.06409, 2016. |
[32] | JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection[J]. arXiv:1705.09587, 2017. |
[33] | YI J, WU P, METAXAS D. ASSD: attentive single shot multibox detector[J]. Computer Vision and Image Understanding, 2019, 189: 102827. |
[34] |
SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.
DOI URL |
[1] | LYU Xiaoqi, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun, LI Yidong. Expert Recommendation Algorithm Combining Attention and Recurrent Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2068-2077. |
[2] | ZHANG Xiangping, LIU Jianxun. Overview of Deep Learning-Based Code Representation and Its Applications [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 2011-2029. |
[3] | LI Dongmei, LUO Sisi, ZHANG Xiaoping, XU Fu. Review on Named Entity Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1954-1968. |
[4] | REN Ning, FU Yan, WU Yanxia, LIANG Pengju, HAN Xi. Review of Research on Imbalance Problem in Deep Learning Applied to Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1933-1953. |
[5] | YANG Caidong, LI Chengyang, LI Zhongbo, XIE Yongqiang, SUN Fangwei, QI Jin. Review of Image Super-resolution Reconstruction Algorithms Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(9): 1990-2010. |
[6] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[7] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[8] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[9] | ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503. |
[10] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[11] | PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660. |
[12] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[13] | LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. |
[14] | ZHAO Yunji, FAN Cunliang, ZHANG Xinliang. Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1417-1428. |
[15] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/