YOLOv4-Tiny的改进轻量级目标检测算法

doi:10.3778/j.issn.1673-9418.2301034

摘要/Abstract

摘要： 目标检测是深度学习的重要分支领域，大量的边缘设备需要轻量级的目标检测算法，但现有的轻量级的通用目标检测算法存在检测精度低、检测速度慢的问题。针对这一问题，提出了一种基于注意力机制的YOLOv4-Tiny的改进算法。调整了原有的YOLOv4-Tiny算法的主干网络的结构，引入了ECA注意力机制，使用空洞卷积改进了传统的SPP结构为DC-SPP结构，并提出了CSATT注意力机制，与特征融合网络PAN形成CSATT-PAN的颈部网络，提高了网络的特征融合能力。提出的YOLOv4-CSATT算法和原始YOLOv4-Tiny算法相比，在检测速度基本持平的情况下，对于信息的敏感程度以及分类的准确程度有了明显的提高，在VOC数据集上精度提高了12.3个百分点，在COCO数据集上高出了6.4个百分点。在VOC数据集上，相比Faster R-CNN、SSD、Efficientdet-d1、YOLOv3-Tiny、YOLOv4-MobileNetv1、YOLOv4-MobileNetv2、PP-YOLO算法在精度上分别高出3.3、5.5、6.3、17.4、10.3、0.9和0.6个百分点，在召回率上分别高出2.8、7.1、4.2、18.0、12.2、2.1和4.0个百分点，FPS达到94。通过提出CSATT注意力机制提高了模型对于空间的通道信息的捕捉能力，并结合ECA注意力机制和特征融合金字塔算法，提高了模型的特征融合的能力以及目标检测精度。

关键词: 目标检测, YOLOv4-Tiny算法, 注意力机制, 轻量级神经网络, 特征融合

Abstract: Object detection is an important branch of deep learning. A large number of edge devices need lightweight object detection algorithms, but the existing lightweight universal object detection algorithms have problems of low detection accuracy and slow detection speed. To solve this problem, an improved YOLOv4-Tiny algorithm based on attention mechanism is proposed. The structure of the original backbone network of YOLOv4-Tiny algorithm is adjusted, the ECA (efficient channel attention) attention mechanism is introduced, the traditional spatial pyramid pooling (SPP) structure is improved to DC-SPP structure by using void convolution, and the CSATT (channel spatial attention) attention mechanism is proposed. The neck network of CSATT-PAN (channel spatial attention path aggregation network) is formed with the feature fusion network PAN, which improves the feature fusion capability of the network. Compared with the original YOLOv4-Tiny algorithm, the proposed YOLOv4-CSATT algorithm is significantly more sensitive to information and accurate in classification when the detection speed is basically the same. The accuracy is increased by 12.3 percentage points on VOC dataset and 6.4 percentage points is increased on COCO dataset. Moreover, the accuracy is 3.3，5.5，6.3，17.4，10.3，0.9 and 0.6 percentage points higher than the Faster R-CNN, SSD, Efficientdet-d1, YOLOv3-Tiny, YOLOv4-MobileNetv1, YOLOv4-MobileNetv2 and PP-YOLO algorithms respectively on VOC dataset, and 2.8, 7.1, 4.2, 18.0, 12.2, 2.1 and 4.0 percentage points higher in recall rate, respectively, with an FPS of 94. In this paper, the CSATT attention mechanism is proposed to improve the model’s ability to capture spatial channel information, and the ECA attention mechanism is combined with the feature fusion pyramid algorithm to improve the model’s feature fusion ability and target detection accuracy.

Key words: object detection, YOLOv4-Tiny algorithm, attention mechanism, lightweight neural network；feature fusion

何湘杰, 宋晓宁. YOLOv4-Tiny的改进轻量级目标检测算法[J]. 计算机科学与探索, 2024, 18(1): 138-150.

HE Xiangjie, SONG Xiaoning. Improved YOLOv4-Tiny Lightweight Target Detection Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 138-150.

参考文献

[1] 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
CAO J L, LI Y L, SUN H Q, et al. A survey on deep learn-ing visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722.
[2] 耿创, 宋品德, 曹立佳. YOLO算法在目标检测中的研究进展[J]. 兵器装备工程学报, 2022, 43(9): 162-173.
GENG C, SONG P D, CAO L J. Research progress of YOLO algorithm in target detection[J]. Journal of Ordnance Equipment Engineering, 2022, 43(9): 162-173.
[3] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceeding of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Reco-gnition. Washington: IEEE Computer Society, 2005: 886-893.
[4] PAPAGEORGIOU C P, OREN M, POGGIO T. A general framework for object detection[C]//Proceeding of the 1998 International Conference on Computer Vision. Washington: IEEE Computer Society, 1998: 555-562.
[5] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[6] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[7] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Repre-sentations, San Diego, May 7-9, 2015.
[8] 金梅, 李义辉, 张立国, 等. 基于注意力机制改进的轻量级目标检测算法[J/OL]. 激光与光电子学进展 [2023-03-06]. http://kns.cnki.net/kcms/detail/31.1690.tn.20220713.1425.
314.html.
JIN M, LI Y H, ZHANG L G, et al. Improved lightweight target detection algorithm based on attention mechanism[J/OL]. Laser & Optoelectronics Progress[2023-03-06]. http://kns.cnki.net/kcms/detail/31.1690.tn.20220713.1425.314.html.
[9] 李维刚, 杨潮, 蒋林, 等. 基于改进YOLOv4算法的室内场景目标检测[J]. 激光与光电子学进展, 2022, 59(18): 1815003.
LI W G, YANG C, JIANG L, et al. Indoor scene target dete-ction based on improved YOLOv4 algorithm[J]. Advances in Laser and Optoelectronics, 2022, 59(18): 1815003.
[10] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich fea-ture hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington:IEEE Computer Society, 2014: 580-587.
[11] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Wash-ington: IEEE Computer Society, 2015: 1440-1448.
[12] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 2017, 39(6): 1137-1149.
[13] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceeding of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[14] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceed-ings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 779-788.
[15] REDMON J, FARHADI A. YOLO9000: better, faster, stron-ger[C]//Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 7263-7271.
[16] REDMON J, FARHADI A. YOLOv3: an incremental imp-rovement[J]. arXiv:1804.02767, 2018.
[17] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[18] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington:IEEE Computer Society, 2017: 2980-2988.
[19] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applica-tions[J]. arXiv:1704.04861, 2017.
[20] SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 4510-4520.
[21] 李仁鹰, 钱慧芳, 郭佳豪, 等. 基于M-YOLOv4模型的轻量化目标检测算法[J]. 国外电子测量技术, 2022, 41(4): 15-21.
LI R Y, QIAN H F, GUO J H, et al. Lightweight target dete-ction algorithm based on M-YOLOv4 model[J]. Foreign Elec-tronic Measurement Technology, 2022, 41(4): 15-21.
[22] TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 6105-6114.
[23] 孔维刚, 李文婧, 王秋艳, 等. 基于改进YOLOv4算法的轻量化网络设计与实现[J]. 计算机工程, 2022, 48(3): 181-188.
KONG W G, LI W J, WANG Q Y, et al. Design and imple-mentation of lightweight network based on improved YOLOv4 algorithm[J]. Computer Engineering, 2022, 48(3): 181-188.
[24] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Piscataway: IEEE, 2020: 390-391.
[25] HAN K, WANG Y, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion. Piscataway: IEEE, 2020: 1580-1589.
[26] HUANG L, YANG Y, DENG Y, et al. Densebox: unifying landmark localization with end to end object detection[J]. arXiv:1509.04874, 2015.
[27] LAW H, DENG J. CornerNet: detecting objects as paired keypoints[C]//Proceedings of the 15th European Confere-nce on Computer Vision. Cham: Springer, 2018: 734-750.
[28] ZHOU X, ZHUO J, KRAHENBUHL P. Bottom-up object detection by grouping extreme and center points[C]//Pro-ceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 850-859.
[29] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[30] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Compu-ter Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 7132-7141.
[31] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Pro-ceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11534-11542.
[32] 陈一潇, 阿里甫·库尔班, 林文龙, 等. 面向拥挤行人检测的CA-YOLOv5[J]. 计算机工程与应用, 2022, 58(9): 238-245.
CHEN Y X, Alifu·Kuerban, LIN W L, et al. CA-YOLOv5 for crowded pedestrian detection[J]. Computer Engineering and Applications, 2022, 58(9): 238-245.
[33] 王玲敏, 段军, 辛立伟. 引入注意力机制的YOLOv5安全帽佩戴检测方法[J]. 计算机工程与应用, 2022, 58(9): 303-312.
WANG L M, DUAN J, XIN L W. YOLOv5 helmet wear detection method with introduction of attention mechanism[J]. Computer Engineering and Applications, 2022, 58(9): 303-312.
[34] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition. Wash-ington: IEEE Computer Society, 2016: 770-778.
[35] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 8759-8768.
[36] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recogni-tion. Washington: IEEE Computer Society, 2017: 2117-2125.
[37] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[38] ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cyber-netics, 2022, 52(8): 8574-8586.
[39] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666.
[40] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 12993-13000.
[41] 范丽丽, 赵宏伟, 赵浩宇, 等. 基于深度卷积神经网络的目标检测研究综述[J]. 光学精密工程, 2020, 28(5): 1152-1164.
FAN L L, ZHAO H W, ZHAO H Y, et al. Survey of target detection based on deep convolutional neural networks[J]. Optics and Precision Engineering, 2020, 28(5): 1152-1164.

编辑推荐 0

Metrics

阅读次数

全文

305

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	110	0	195

来源	本网站	其他网站

次数	292	13
比例	96%	4%

摘要

521

最新录用	在线预览	正式出版

183	0	338

	来源	本网站

	次数	521
	比例	100%