基于改进YOLOv8算法的鱼眼图像下行人检测

doi:10.3778/j.issn.1673-9418.2404037

摘要/Abstract

摘要： 针对现有目标检测算法在鱼眼图像下行人检测中存在定位不准确、检测精度不足等问题，提出了一种面向鱼眼图像检测的YOLOv8改进算法。该方法通过加入角度参数，设计了ProbIoU-r算法，利用缩放因子调整角度差异对损失的影响，增强模型在梯度计算中对边界框角度偏移的关注，解决了原始IoU在旋转目标检测定位不精确、边界框拟合效果差等问题，使YOLOv8网络模型具有更好感知旋转目标的能力。为提高模型对鱼眼图像失真目标的特征提取能力同时提升检测精度，提出以多尺度卷积和注意力机制为分支的Parnet-gcs模块，通过不同卷积核的DWConv提取不同尺度的特征信息，并结合CA和SA模块以增强模型特征表达能力。实验采用公开的鱼眼图像数据集WEPDTOF，改进后算法相比原始YOLOv8s检测精度mAP0.50:0.95增加了2.3个百分点；相比YOLOv8m算法参数量减少了38.8%，同时精度mAP0.50:0.95也高出0.5个百分点，说明基于YOLOv8s改进后的算法能够更好适用于鱼眼图像下行人检测任务。

关键词: 目标检测, YOLOv8, 注意力机制, 鱼眼图像

Abstract: In view of the problems of inaccurate positioning and insufficient detection accuracy in pedestrian detection in fisheye images in existing target detection algorithms, an improved YOLOv8 algorithm for fisheye image detection is proposed. This method designs the ProbIoU-r algorithm by adding angle parameters, uses the scaling factor to adjust the impact of angle difference on the loss, and enhances the model’s attention to the angle offset of the bounding box in gradient calculation, solving the problems of inaccurate positioning of the original IoU in rotated target detection and poor bounding box fitting effect, so that the YOLOv8 network model has better ability to perceive rotated targets. In order to improve the model’s feature extraction ability for distorted targets in fisheye images and improve detection accuracy, a Parnet-gcs module with multi-scale convolution and attention mechanism as branches is proposed. The feature information of different scales is extracted through DWConv with different convolution kernels, and the CA and SA modules are combined to enhance the model’s feature expression ability. The experiment uses the public fisheye image dataset WEPDTOF. The improved algorithm increases the detection accuracy mAP0.50:0.95 by 2.3 percentage points compared with the original YOLOv8s; the number of parameters is reduced by 38.8% compared with the YOLOv8m algorithm, and the accuracy mAP0.50:0.95 is also 0.5 percentage points higher, indicating that the improved algorithm based on YOLOv8s is better suitable for pedestrian detection tasks in fisheye images.

Key words: object detection, YOLOv8, attention mechanism, fisheye image

朱玉敏, 孙光灵, 缪飞. 基于改进YOLOv8算法的鱼眼图像下行人检测[J]. 计算机科学与探索, 2025, 19(2): 443-453.

ZHU Yumin, SUN Guangling, MIAO Fei. Pedestrian Detection in Fisheye Images Based on Improved YOLOv8 Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 443-453.

参考文献

[1] 涂波, 刘璐, 刘一会, 等. 一种扩展小孔成像模型的鱼眼相机矫正与标定方法[J]. 自动化学报, 2014, 40(4): 653-659.
TU B, LIU L, LIU Y H, et al. A calibration method for fish-eye cameras based on pinhole model[J]. Acta Automatica Sinica, 2014, 40(4): 653-659.
[2] SUN J, ZHU J. Calibration and correction for omnidirectional image with a fisheye lens[C]//Proceedings of the 2008 4th International Conference on Natural Computation, Jinan, Oct 18-20, 2008: 133-137.
[3] BARMAN A, WU W, LOCE R P, et al. Person re-identification using overhead view fisheye lens cameras[C]//Proceedings of the 2018 IEEE International Symposium on Technologies for Homeland Security, Woburn, Oct 23-24, 2018. Red Hook: Curran Associates, 2018: 1-7.
[4] BERTOZZI M, CASTANGIA L, CATTANI S, et al. 360 detection and tracking algorithm of both pedestrian and vehicle using fisheye images[C]//Proceedings of the 2015 IEEE Intelligent Vehicles Symposium, Seoul, Jun 29-Jul 1,2015. Piscataway: IEEE, 2015: 132-137.
[5] CHIANG S H, WANG T P, CHEN Y F. Efficient pedestrian detection in top-view fisheye images using compositions of perspective view patches[J]. Image and Vision Computing, 2021, 105: 1-8.
[6] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149.
[7] ZHENG Q, SAPONARA S, TIAN X, et al. A real-time constellation image classification method of wireless communication signals based on the lightweight network MobileViT[J]. Cognitive Neurodynamics, 2024, 18(2): 659-671.
[8] TAMURA M, HORIGUCHI S, MURAKAMI T. Omnidirectional pedestrian detection by rotation invariant training[C]//Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, Jan 7-11, 2019. Piscataway: IEEE, 2019: 1989-1998.
[9] WEI X, WEI Y, LU X. RMDC: rotation-mask deformable convolution for object detection in top-view fisheye cameras[J]. Neurocomputing, 2022, 504: 99-108.
[10] 吴兆东, 徐成, 刘宏哲, 等. 适用于鱼眼图像的改进YOLOv7目标检测算法[J]. 计算机工程与应用, 2024, 60(14): 250-256.
WU Z D, XU C, LIU H Z, et al. Improved YOLOv7 object detection algorithm for fisheye images[J]. Computer Engineering and Applications, 2024, 60(14): 250-256.
[11] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Van-couver, Jun 17-24, 2023. Piscataway: IEEE, 2023: 7464-7475.
[12] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Piscataway: IEEE, 2017: 2117-2125.
[13] GOYAL A, BOCHKOVSKIY A, DENG J, et al. Non-deep networks[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 6789-6801.
[14] HAN K, WANG Y, TIAN Q, et al. Ghost-net: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1580-1589.
[15] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Piscataway: IEEE, 2017: 1251-1258.
[16] LI H, LI J, WEI H, et al. Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles[EB/OL]. [2024-02-13]. https://arxiv.org/abs/2206.02424.
[17] ZHENG Q, TIAN X, YU Z, et al. MobileRaT: a lightweight radio transformer method for automatic modulation classification in drone communication systems[J]. Drones, 2023, 7(10): 596.
[18] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 7132-7141.
[19] ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11.
[20] LLERENA J M, ZENI L F, KRISTEN L N, et al. Gaussian bounding boxes and probabilistic intersection-over-union for object detection[EB/OL]. [2024-02-13]. https://arxiv.org/abs/2106.06072.
[21] ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation[J]. IEEE Transactions on Cybernetics, 2021, 52(8): 8574-8586.
[22] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 12993-13000.
[23] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of the 2019 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 658-666.
[24] TEZCAN O, DUAN Z, COKBAS M, et al. WEPDTOF: a dataset and benchmark algorithms for in-the-wild people detection and tracking from overhead fisheye cameras[C]//Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 503-512.
[25] LI S, TEZCAN M O, ISHWAR P, et al. Supervised people counting using an overhead fisheye camera[C]//Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, Taipei, China, Sep 18-21, 2019. Piscataway: IEEE, 2019: 1-8.
[26] DUAN Z, TEZCAN O, NAKAMURA H, et al. RAPiD: rotation-aware people detection in overhead fisheye images[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 636-637.
[27] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL].[2024-02-13]. https://arxiv.org/abs/2004.10934.
[28] 李战. 基于俯视鱼眼图像的密集行人检测算法研究[D].北京: 北方工业大学, 2023.
LI Z. Research on dense pedestrian detection algorithm based on overhead fisheye images[D]. Beijing: North China University of Technology, 2023.
[29] LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL].[2024-02-13]. https://arxiv.org/abs/2209.02976.
[30] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018.Cham: Springer, 2018: 3-19.
[31] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 13713-13722.
[32] LIU Y, SHAO Z, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. [2024-02-13]. https://arxiv.org/abs/2112.05561.
[33] LI Y, YAO T, PAN Y, et al. Contextual transformer networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1489-1500.