Survey of Development of YOLO Object Detection Algorithms

doi:10.3778/j.issn.1673-9418.2402044

Abstract

Abstract: In recent years, deep learning-based object detection algorithms have been a hot topic in computer vision research, with the YOLO (you only look once) algorithm standing out as an excellent object detection algorithm. The evolution of its network architecture has played a crucial role in improving detection speed and accuracy. This paper conducts a comprehensive horizontal analysis of the overall frameworks of YOLOv1 to YOLOv9, comparing the network architecture (backbone network, neck layers and head layers) and loss functions. The strengths and limitations of different improvement methods are thoroughly discussed, with a specific evaluation of the impact of these improvements on model accuracy. This paper also delves into discussions on dataset selection and construction methods, the rationale behind choosing different evaluation metrics, and their applicability and limitations in various application scenarios. It further explores specific improvement methods for YOLO algorithm in five application domains (industrial, transportation, remote sensing, agriculture, biology), and discusses the balance among detection speed, accuracy, and complexity in these application domains. Finally, this paper analyzes the current development status of YOLO in various fields, summarizes existing issues in YOLO algorithm research through specific examples, and in conjunction with the trends in application domains, provides an outlook on the future of the YOLO algorithm. It also offers detailed explanations for four future research directions of YOLO (multi-task learning, edge computing, multimodal integration, virtual and augmented reality technology).

Key words: YOLO algorithm, object detection, computer vision, feature extraction, convolutional neural network

摘要： 近年来，基于深度学习的目标检测算法是计算机视觉研究热点，YOLO算法作为一种优秀的目标检测算法，其发展历程中网络架构的改进，对于提高检测速度和精度起到了重要作用。对YOLOv1~YOLOv9的整体框架进行了横向分析，从网络架构（骨干网络、颈部层、头部层）、损失函数方面进行了对比分析，充分讨论了不同改进方法的优势和局限性，具体评估了改进方法对模型精度的提升效果。讨论了数据集的选择与构建方法、不同评价指标的选择依据，及其在不同应用场景中的适用性和局限性，深入研究了在五个应用领域（工业、交通、遥感、农业、生物）YOLO算法的具体改进，并对检测速度、检测精度及复杂度之间的平衡进行探讨。分析了YOLO在各领域的发展现状，通过具体实例总结YOLO算法研究中存在的问题，并结合应用领域的发展趋势，展望YOLO系列算法的未来，详细探讨了YOLO算法的四个研究方向（多任务学习、边缘计算、多模态结合、虚拟和增强现实技术）。

关键词: YOLO算法, 目标检测, 计算机视觉, 特征提取, 卷积神经网络

XU Yanwei, LI Jun, DONG Yuanfang, ZHANG Xiaoli. Survey of Development of YOLO Object Detection Algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2221-2238.

徐彦威, 李军, 董元方, 张小利. YOLO系列目标检测算法综述[J]. 计算机科学与探索, 2024, 18(9): 2221-2238.

References

[1] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[2] LIENHART R, MAYDT J. An extended set of Haar-like features for rapid object detection[C]//Proceedings of the 2002 International Conference on Image Processing, Rochester, Sep 22-25, 2002. Piscataway: IEEE, 2002: 900-903.
[3] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2005: 886-893.
[4] CRISTIANINI N, SHAWE TAYLOR J. An introduction to support vector machines and other kernel-based learning methods[M]. Cambridge: Cambridge University Press, 2000.
[5] FREUND Y, SCHAPIRE R E. Experiments with a new boosting algorithm[C]//Proceedings of the 13th International Conference on Machine Learning, Bari, Jul 3-6, 1996. San Francisco: Morgan Kaufmann, 1996: 148-156.
[6] LIAW A, WIENER M. Classification and regression by random forest[J]. R News, 2002, 2/3: 18-22.
[7] NEUBECK A,VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Washington: IEEE Computer Society, 2006: 850-855.
[8] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington:IEEE Computer Society, 2015: 1440-1448.
[9] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788.
[10] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-12-15]. https://arxiv.org/abs/1804.02767.
[11] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. [2023-12-15]. https://arxiv.org/abs/2209.02976.
[12] WANG C Y, BOCHKOVSKIY A, LIAO H-Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE, 2023: 7464-7475.
[13] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 7263-7271.
[14] BOCHKOVSKIY A,WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-12-15]. https://arxiv.org/abs/2004.10934.
[15] WANG C, YEH I, LIAO H. YOLOv9: learning what you want to learn using programmable gradient information[EB/OL]. [2023-12-15]. https://arxiv.org/abs/2402.13616.
[16] TERVEN J, CORDOVA-ESPARZA D. A comprehensive review of YOLO: from YOLOv1 to YOLOv8 and beyond[EB/OL]. [2023-12-15]. https://arxiv.org/abs/2304.00501.
[17] DIWAN T, ANIRUDH G, TEMBHURNE J V. Object detection using YOLO: challenges, architectural successors, datasets and applications[J]. Multimedia Tools and Applications, 2023, 82(6): 9243-9275.
[18] HUSSAIN M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection[J]. Machines, 2023, 11(7): 677.
[19] 王琳毅, 白静, 李文静, 等. YOLO系列目标检测算法研究进展[J]. 计算机工程与应用, 2023, 59(14): 15-29.
WANG L Y, BAI J, LI W J, et al. Research progress of YOLO series target detection algorithms[J]. Computer Engineering and Applications, 2023, 59(14): 15-29.
[20] SIRISHA U, PRAVEEN S P, SRINIVASU P N, et al. Statistical analysis of design aspects of various YOLO-based deep learning models for object detection[J]. International Journal of Computational Intelligence Systems, 2023, 16(1): 126.
[21] 贾晓芬, 吴雪茹, 赵佰亭. 绝缘子自爆缺陷的轻量化检测网络DE-YOLO[J]. 电子测量与仪器学报, 2023, 37(5): 28-35.
JIA X F, WU X R, ZHAO B T. Lightweight detection network for insulator self-detonation defect DE-YOLO[J]. Journal of Electronic Measurement and Instrument, 2023, 37(5): 28-35.
[22] 李想, 特日根, 仪锋, 等. 针对全球储油罐检测的TCS-YOLO模型[J]. 光学精密工程, 2023, 31(2): 246-262.
LI X, TE R G, YI F, et al. TCS-YOLO model for global oil storage tank inspection[J]. Optics and Precision Engineering, 2023, 31(2): 246-262.
[23] 卢俊哲, 张铖怡, 刘世鹏, 等. 面向复杂环境中带钢表面缺陷检测的轻量级DCN-YOLO[J]. 计算机工程与应用, 2023, 59(15): 318-328.
LU J Z, ZHANG C Y, LIU S P, et al. Lightweight DCN-YOLO for strip surface defect detection in complex environments[J]. Computer Engineering and Applications, 2023, 59(15): 318-328.
[24] 宁纪锋, 林靖雅, 杨蜀秦, 等. 基于改进YOLO v5s的奶山羊面部识别方法[J]. 农业机械学报, 2023, 54(4): 331-337.
NING J F, LIN J Y, YANG S Q, et al. Face recognition method of dairy goat based on improved YOLO v5s[J]. Transactions of the Chinese Society of Agricultural Machinery, 2023, 54(4): 331-337.
[25] 苏志威, 黄子涵, 邱发生, 等. 基于改进YOLOv8的航空铝合金焊缝缺陷检测方法[J]. 航空动力学报, 2024, 39(6): 20230414.
SU Z W, HUANG Z H, QIU F S, et al. Weld defect detection of aviation aluminum alloy based on improved YOLOv8[J]. Journal of Aerospace Power, 2024, 39(6): 20230414.
[26] 孙建波, 王丽杰, 麻吉辉, 等. 基于改进YOLOv5s算法的光伏组件故障检测[J]. 红外技术, 2023, 45(2): 202-208.
SUN J B, WANG L J, MA J H, et al. Photovoltaic module fault detection based on improved YOLOv5s algorithm[J]. Infrared Technology, 2023, 45(2): 202-208.
[27] 谢椿辉, 吴金明, 徐怀宇. 改进YOLOv5的无人机影像小目标检测算法[J]. 计算机工程与应用, 2023, 59(9): 198-206.
XIE C H, WU J M, XU H Y. Small object detection algorithm based on improved YOLOv5 in UAV image[J]. Computer Engineering and Applications, 2023, 59(9): 198-206.
[28] 杨断利, 王永胜, 陈辉, 等. 基于改进YOLO v6-tiny的蛋鸡啄羽行为识别与个体分类[J]. 农业机械学报, 2023, 54(5): 268-277.
YANG D L, WANG Y S, CHEN H, et al. Feather pecking abnormal behavior identification and individual classification method of laying hens based on improved YOLO v6-tiny[J]. Transactions of the Chinese Society of Agricultural Machinery, 2023, 54(5): 268-277.
[29] 张利丰, 田莹. 改进YOLOv8的多尺度轻量型车辆目标检测算法[J]. 计算机工程与应用, 2024, 60(3): 129-137.
ZHANG L F, TIAN Y. Improved YOLOv8 multi-scale and lightweight vehicle object detection algorithm[J]. Computer Engineering and Applications, 2024, 60(3): 129-137.
[30] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1-9.
[31] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[32] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-12-15]. https://arxiv.org/abs/1409.1556.
[33] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015: 448-456.
[34] MAAS A L, HANNUN A Y, NG A Y. Rectifier nonlinearities improve neural network acoustic models[C]//Proceedings of the 30th International Conference on Machine Learning, Atlanta, Jun 16-21, 2013: 3.
[35] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 390-391.
[36] MISRA D. Mish: a self regularized non-monotonic activation function[EB/OL]. [2023-12-15]. https://arxiv.org/abs/1908.08681.
[37] GHIASI G, LIN T Y, LE Q V. Dropblock: a regularization method for convolutional networks[C]//Advances in Neural Information Processing Systems 31, Montréal, Dec 3-8, 2018: 10750-10760.
[38] ZHU X, CHENG D, ZHANG Z, et al. An empirical study of spatial attention mechanisms in deep networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6688-6697.
[39] LIN T-Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 2117-2125.
[40] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[41] WANG W, XIE E, SONG X, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 8440-8449.
[42] EVERINGHAM M, ESLAMI S A, VAN GOOL L, et al. The Pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111: 98-136.
[43] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2009: 248-255.
[44] KUZNETSOVA A, ROM H, ALLDRIN N, et al. The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale[J]. International Journal of Computer Vision, 2020, 128(7): 1956-1981.
[45] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755.
[46] XU X, JIANG Y, CHEN W, et al. Damo-YOLO: a report on real-time object detection design[EB/OL]. [2023-12-15]. https://arxiv.org/abs/2211.15444.
[47] WANG C, HE W, NIE Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[C]//Advances in Neural Information Processing Systems 36, New Orleans, Dec 10-16, 2023.
[48] XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 3974-3983.
[49] ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2022, 126: 103514.
[50] 牛为华, 殷苗苗. 基于改进YOLO v5的道路小目标检测算法[J]. 传感技术学报, 2023, 36(1): 36-44.
NIU W H, YIN M M. Road small target detection algorithm based on improved YOLO v5[J]. Journal of Transduction Technology, 2023, 36(1): 36-44.
[51] 郭克友, 王苏东, 李雪, 等. 基于Dim Env-YOLO算法的昏暗场景车辆多目标检测[J]. 计算机工程, 2023, 49(3): 312-320.
GUO K Y, WANG S D, LI X, et al. Multi-target detection of vehicles in dim scenes based on Dim Env-YOLO algorithm[J]. Computer Engineering, 2023, 49(3): 312-320.
[52] 鲍文霞, 谢文杰, 胡根生, 等. 基于TPH-YOLO的无人机图像麦穗计数方法[J]. 农业工程学报, 2023, 39(1): 155-161.
BAO W X, XIE W J, HU G S, et al. Wheat ear counting method in UAV images based on TPH-YOLO[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39(1): 155-161.
[53] 林文树, 张金生, 何乃磊. 基于改进YOLO v4的落叶松毛虫侵害树木实时检测方法[J]. 农业机械学报, 2023, 54(4): 304-312.
LIN W S, ZHANG J S, HE N L. Real-time detection method of dendrolimus superans-infested larix gmelinii trees based on improved YOLO v4[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(4): 304-312.
[54] 郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
HAO P F, LIU L Q, GU R Y. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model[J]. Journal of Graphics, 2023, 44(3): 456-464.
[55] 盛帅, 段先华, 胡维康, 等. Dynamic-YOLOX: 复杂背景下的苹果叶片病害检测模型[J]. 计算机科学与探索, 2024, 18(8): 2118-2129.
SHENG S, DUAN X H, HU W K, et al. Dynamic-YOLOX: detection model for apple leaf disease in complex background[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2118-2129.
[56] 王春梅, 刘欢. YOLOv8-VSC：一种轻量级的带钢表面缺陷检测算法[J]. 计算机科学与探索, 2024, 18(1): 151-160.
WANG C M, LIU H. YOLOv8-VSC: lightweight algorithm for strip surface defect detection[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 151-160.
[57] 聂源, 赖惠成, 高古学. 改进YOLOv7+Bytetrack的小目标检测与追踪[J]. 计算机工程与应用, 2024, 60(12): 189-202.
NIE Y, LAI H C, GAO G X. Improved small target detection and tracking with YOLOv7+Bytetrack[J]. Computer Engineering and Applications, 2024, 60(12): 189-202.
[58] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 1251-1258.
[59] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[60] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[61] HAN K, WANG Y, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1580-1589.
[62] QIN Z, LI Z, ZHANG Z, et al. ThunderNet: towards real-time generic object detection on mobile devices[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6718-6727.
[63] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229.