改进YOLOv5的交通灯实时检测鲁棒算法

doi:10.3778/j.issn.1673-9418.2105033

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (1): 231-241.DOI: 10.3778/j.issn.1673-9418.2105033

改进YOLOv5的交通灯实时检测鲁棒算法

钱伍, 王国中, 李国平⁺()

上海工程技术大学电子电气工程学院,上海 201620

收稿日期:2021-05-11 修回日期:2021-08-13 出版日期:2022-01-01 发布日期:2021-08-25
通讯作者: + E-mail: liguoping@sues.edu.cn
作者简介:钱伍（1995—）,男,硕士研究生,主要研究方向为计算机视觉、深度学习。
王国中（1962—）,男,博士,教授,博士生导师,主要研究方向为视频编解码、图像处理、机器学习。
李国平（1974—）,男,博士,高级工程师,硕士生导师,主要研究方向为音视频编码、智能媒体处理、机器学习与识别。
基金资助:
国家重点研发计划(2019YFB1802700)

Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm

QIAN Wu, WANG Guozhong, LI Guoping⁺()

School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

Received:2021-05-11 Revised:2021-08-13 Online:2022-01-01 Published:2021-08-25
About author:QIAN Wu, born in 1995, M.S. candidate. His research interests include computer vision and deep learning.
WANG Guozhong, born in 1962, Ph.D., professor, Ph.D. supervisor. His research interests include video codec, image processing and machine learning.
LI Guoping, born in 1974, Ph.D., senior engi-neer, M.S. supervisor. His research interests in-clude audio and video coding, intelligent media processing, machine learning and recognition.
Supported by:
National Key Research and Development Program of China(2019YFB1802700)

摘要/Abstract

摘要：

交通灯检测算法作为自动驾驶任务中的一个重要环节,直接关系到智能汽车的行车安全。因为交通灯尺度小且环境复杂,给算法研究带来了困难。针对交通检测存在的痛点,提出改进YOLOv5的交通灯检测算法。首先使用可见标签比确定模型输入;然后引入ACBlock结构增加主干网络的特征提取能力,设计SoftPool减少主干网络的采样信息损失,使用DSConv卷积核减少模型参数;最后设计了记忆性特征融合网络,高效利用了高级语义信息和底层特征。对模型输入和主干网络的改进,直接提高模型在复杂环境下对特征的提取能力;对特征融合网络的改进,使模型能够充分利用特征信息,增加对目标定位和边界回归的精准度。实验结果表明,改进后的方法在BDD100K数据集上取得了74.3%的AP和111 frame/s的检测速度,比YOLOv5提高11.0个百分点的AP;在Bosch数据集上取得了84.4%的AP和126 frame/s的检测速度,比YOLOv5提高9.3个百分点的AP。鲁棒性测试结果表明,改进后的模型在各种复杂环境中对目标的检测能力都有显著提升,鲁棒性增加,做到了高精度实时检测。

关键词: 交通灯检测, YOLOv5, 记忆性特征融合网络, BDD100K, 实时检测

Abstract:

Traffic light detection algorithm, a critical procedure for realization of automatic driving, is directly related to the driving safety of intelligent vehicles. However, due to the small size of traffic lights and complicated environment, the algorithm research meets plenty of difficulties. This paper puts forward a traffic light detection algorithm based on optimized YOLOv5. Firstly, it uses a visible label ratio to determine the model input. Secondly, the ACBlock structure is introduced to increase the feature extraction ability of the backbone network; the SoftPool is designed to reduce the sample loss of the backbone network and the DSConv convolution kernel is used to reduce the model parameters. Finally, a memory feature fusion network is designed to efficiently utilize high level semantic information and low level features. As a result, the improvement of model input and backbone network directly improves the feature extraction ability of the model in complex environment; the improvement of feature fusion network enables the model to make full use of feature information and increase the accuracy of target positioning and boundary regression. Experimental results show that, the proposed algorithm achieves 74.3% AP and 111 frame/s detection speed on BDD100K, which is 11.0 percentage points higher than the AP of YOLOv5. In Bosch data set, 84.4% AP and 126 frame/s detection speed are obtained, which is 9.3 percentage points higher than the AP of YOLOv5. The robustness test results show that the proposed algorithm has significantly improved the detection ability of tar-gets in a variety of complex environments, and the robustness is increased to achieve high-precision real-time detection.

Key words: traffic light detection, YOLOv5, memory feature fusion network, BDD100K, real-time detection

中图分类号:

TP391.4

钱伍, 王国中, 李国平. 改进YOLOv5的交通灯实时检测鲁棒算法[J]. 计算机科学与探索, 2022, 16(1): 231-241.

QIAN Wu, WANG Guozhong, LI Guoping. Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 231-241.

图/表 16

图1 YOLOv5的网络结构

Fig.1 YOLOv5 network structure

图2 I-YOLOv5的网络结构

Fig.2 I-YOLOv5 network structure

表1 可见标签比与模型性能关系

Table 1 Relationship between visible label ratio and model performance

$s$	Visible label ratio/%	Input size	YOLOv5s			I-YOLOv5s
$s$	Visible label ratio/%	Input size	AP/%	Time/ms	GPU/GB	AP/%	Time/ms	GPU/GB
10	88.61	$1024 \times 1024$	66.3	123	7.69	73.0	98	17.9
11	82.57	$960 \times 960$	65.6	123	7.65	73.0	99	17.7
12	75.77	$864 \times 864$	64.8	123	7.60	72.8	104	14.5
13	68.77	$800 \times 800$	64.0	122	7.10	72.3	111	12.7
14	62.35	$736 \times 736$	63.1	123	5.50	71.6	111	7.77
16	51.26	$640 \times 640$	61.6	132	5.24	69.9	122	7.38

表1 可见标签比与模型性能关系

Table 1 Relationship between visible label ratio and model performance

$s$	Visible label ratio/%	Input size	YOLOv5s			I-YOLOv5s
$s$	Visible label ratio/%	Input size	AP/%	Time/ms	GPU/GB	AP/%	Time/ms	GPU/GB
10	88.61	$1024 \times 1024$	66.3	123	7.69	73.0	98	17.9
11	82.57	$960 \times 960$	65.6	123	7.65	73.0	99	17.7
12	75.77	$864 \times 864$	64.8	123	7.60	72.8	104	14.5
13	68.77	$800 \times 800$	64.0	122	7.10	72.3	111	12.7
14	62.35	$736 \times 736$	63.1	123	5.50	71.6	111	7.77
16	51.26	$640 \times 640$	61.6	132	5.24	69.9	122	7.38

图3 ACBlock结构

Fig.3 ACBlock structure

图4 SoftPool下采样过程

Fig.4 SoftPool down sampling process

图5 正则卷积与DSConv对比

Fig.5 Regular convolution compared with DSConv

图6 PANet与本文特征融合网络结构

Fig.6 PANet and proposed feature fusion network structure

图7 特征融合后的热力图

Fig.7 Heatmap of fused features

表2 预测框与真实框的平均IOU

Table 2 Average IOU of predict and ground truth boxes

Model	Average IOU
YOLOv5+PANet	0.528
YOLOv5+Our FPN	0.591

图8 模型训练过程

Fig.8 Model training process

表3 以YOLOv5l为基础模型的消融实验

Table 3 Ablation experiments based on YOLOv5l

ACBlock	SoftPool	DSConv	Our FPN	AP/%	FLOPS/10⁹
—	—	—	—	62.9	117
√	—	—	—	68.3	198
—	√	—	—	65.7	115
—	—	√	—	62.7	112
—	—	—	√	68.5	92
√	√	—	—	69.9	195
√	√	√	—	69.6	191
√	√	√	√	73.2	166

表4 不同模型在BDDTL数据集上的测试结果

Table 4 Test results of different models on BDDTL

Model	Input size	AP/%	检测速度/(frame/s)
Dense-ACSSD^[33]	448×448	10.27	35
YOLOv3	416×416	37.67	27
Gaussian YOLOv3	416×416	46.78	30
YOLOv5s	640×640	61.60	132
YOLOv5m	640×640	62.80	100
YOLOv5l	640×640	62.90	83
YOLOv5x	640×640	63.30	55
EfficientDet-D0	512×512	14.50	33
EfficientDet-D1	640×640	28.90	25
EfficientDet-D2	736×736	45.50	24
I-YOLOv5s	800×800	72.30(+9.0)	111
I-YOLOv5m	800×800	73.60(+10.3)	76
I-YOLOv5l	800×800	73.90(+10.6)	62
I-YOLOv5x	800×800	74.30(+11.0)	40

表5 不同模型在Bosch数据集上的测试结果

Table 5 Test results of different models on Bosch

Model	Input size	AP/%	检测速度/(frame/s)
YOLOv5s	640×640	75.1	130
YOLOv5m	640×640	67.6	100
YOLOv5l	640×640	67.9	83
YOLOv5x	640×640	74.2	52
I-YOLOv5s	800×800	82.8	126
I-YOLOv5m	800×800	82.8	91
I-YOLOv5l	800×800	82.9	71
I-YOLOv5x	800×800	84.4(+9.3)	46

图9 模型检测效果

Fig.9 Model detection effect

表6 改进YOLOv5和YOLOv5鲁棒性测试结果

Table 6 Improved YOLOv5 and YOLOv5 robustness test results %

Condition		YOLOv5s	YOLOv5m	YOLOv5l	YOLOv5x	I-YOLOv5s	I-YOLOv5m	I-YOLOv5l	I-YOLOv5x
size	small	59.3	60.4	60.3	60.7	69.9	71.2	71.5	71.9(+11.2)
	medium	76.3	78.7	79.0	79.3	85.6	86.6	87.2(+7.9)	87.2
	large	34.2	48.6	33.5	37.0	56.6	68.0(+19.4)	53.5	66.1
time	dawn/dusk	60.1	62.6	62.3	62.8	73.5	74.9	75.5	75.8(+13.0)
	daytime	63.5	64.9	65.0	65.8	76.1	77.9	78.5	78.6(+12.8)
	night	58.9	59.7	59.8	59.5	66.8	67.4	67.5	68.1(+8.3)
scene	city street	62.3	63.5	63.7	63.9	73.0	74.2	74.5(+10.6)	74.9
	gas station	71.6	54.9	56.9	54.3	58.7	72.1	77.3	80.4(+8.8)
	highway	56.7	58.0	57.7	58.1	67.9	68.4	69.5	69.6(+11.5)
	parking lot	43.4	48.4	41.7	40.3	56.3	61.8	69.5(+21.1)	64.4
	residential	60.6	62.3	62.4	63.4	70.6	73.4	74.1	74.8(+11.4)
	tunnel	68.6	47.9	83.9	90.4	77.9	82.2(-8.2)	71.3	78.2
weather	clear	61.0	62.0	61.9	62.4	70.6	71.6	71.6	72.1(+9.7)
	foggy	44.3	49.2	40.2	49.8	63.3	75.2(+25.4)	63.5	59.4
	overcast	65.1	67.0	66.2	67.6	77.6	79.4	79.3	80.2(+12.6)
	partly cloud	64.4	65.3	66.4	66.5	77.1	78.0	79.2(+12.7)	78.6
	rainy	58.3	60.2	60.8	59.3	69.6	70.9	71.7(+10.9)	71.5
	snowy	61.6	62.6	63.8	62.5	71.4	72.6	73.9(+10.1)	73.9

表7 以YOLOv5l为基础模型的鲁棒性消融实验

Table 7 Robust ablation experiment based on YOLOv5l

ACBlock	SoftPool	DSConv	Our FPN	$mA P_{size}$ /%	$mA P_{time}$ /%	$mA P_{scene}$ /%	$mA P_{weather}$ /%	A mAP I
—	—	—	—	57.6	62.4	61.1	59.9
√	—	—	—	62.8	67.5	64.3	65.2	4.7
—	√	—	—	60.5	65.1	63.6	62.9	2.8
—	—	√	—	57.2	62.1	61.0	59.6	-0.2
—	—	—	√	62.7	68.0	64.6	65.8	5.0
√	√	—	—	64.6	69.4	65.5	67.2	6.4
√	√	√	—	64.6	69.2	65.0	67.2	6.3
√	√	√	√	69.3	72.3	71.3	71.0	10.8

表7 以YOLOv5l为基础模型的鲁棒性消融实验

Table 7 Robust ablation experiment based on YOLOv5l

ACBlock	SoftPool	DSConv	Our FPN	$mA P_{size}$ /%	$mA P_{time}$ /%	$mA P_{scene}$ /%	$mA P_{weather}$ /%	A mAP I
—	—	—	—	57.6	62.4	61.1	59.9
√	—	—	—	62.8	67.5	64.3	65.2	4.7
—	√	—	—	60.5	65.1	63.6	62.9	2.8
—	—	√	—	57.2	62.1	61.0	59.6	-0.2
—	—	—	√	62.7	68.0	64.6	65.8	5.0
√	√	—	—	64.6	69.4	65.5	67.2	6.4
√	√	√	—	64.6	69.2	65.0	67.2	6.3
√	√	√	√	69.3	72.3	71.3	71.0	10.8

参考文献 33

[1]	KANOPOULOS N, VASANTHAVADA N, BAKER R L. Design of an image edge detection filter using the Sobel operator[J]. IEEE Journal of Solid-State Circuits, 1988, 23(2):358-367. DOI URL
[2]	ILLINGWORTH J, KITTLER J. The adaptive Hough trans-form[J]. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 1987(5):690-698.
[3]	DUAN K B, KEERTHI S S. Which is the best multiclass SVM method? An empirical study[C]// LNCS 3541: Procee-dings of the International Workshop on Multiple Classifier Systems, Seaside, Jun 13-15, 2005. Berlin, Heidelberg: Sp-ringer, 2005: 278-285.
[4]	OMACHI M, OMACHI S. Traffic light detection with color and edge information[C]// Proceedings of the 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, Aug 8-11, 2009. Piscata-way: IEEE, 2009: 284-287.
[5]	LI Y, CAI Z, GU M, et al. Notice of retraction: traffic lights recognition based on morphology filtering and statistical classification[C]// Proceedings of the 2011 7th Interna-tional Conference on Natural Computation, Shanghai, Jul 26-28, 2011. Washington: IEEE Computer Society, 2011: 1700-1704.
[6]	SERMANET P, EIGEN D, ZHANG X, et al. Overfeat: inte-grated recognition, localization and detection using convo-lutional networks[J]. arXiv:1312.6229, 2013.
[7]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[8]	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J]. arXiv:1712.00960, 2017.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[10]	BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[11]	REDMON J, FARHADI A. YOLOv3: an incremental imp-rovement[J]. arXiv:1804.02767, 2018.
[12]	Ultralytics. YOLOv5[EB/OL]. [2021-03-14]. https://github.com/ultralytics/yolov5.
[13]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Procee-dings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Was-hington: IEEE Computer Society, 2016: 779-788.
[14]	REDMON J, FARHADI A. YOLO9000: better, faster, stron-ger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 7263-7271.
[15]	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10781-10790.
[16]	DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[J]. arXiv:1605.06409, 2016.
[17]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and se-mantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 20-23, 2014. Washington: IEEE Computer Society, 2014: 580-587.
[18]	GIRSHICK R B. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[19]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. arXiv:1506.01497, 2015.
[20]	HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. DOI URL
[21]	MANANA M, TU C, OWOLAWI P A. Preprocessed faster RCNN for vehicle detection[C]// Proceedings of the 2018 International Conference on Intelligent and Innovative Com-puting Applications, Plaine Magnien, Dec 6-7, 2018. Washing-ton: IEEE Computer Society, 2018: 1-4.
[22]	WANG C, ZHANG G W, ZHOU W, et al. Traffic lights detection based on deep learning feature[C]// Proceedings of the 2019 International Conference on Internet of Things as a Service, Zurich, Nov 23-24, 2019. Cham: Springer, 2019: 382-396.
[23]	LIU J, ZHANG D. Research on vehicle object detection algorithm based on improved YOLOv3 algorithm[J]. Pro-ceedings of the Journal of Physics: Conference Series, 2020, 1575(1):012150.
[24]	THIPSANTHIA P, CHAMCHONG R, SONGRAM P. Road sign detection and recognition of Thai traffic based on YOLOv3[C]// LNCS 11909: Proceedings of the 13th Inter-national Conference on Multi-disciplinary Trends in Arti-ficial Intelligence, Kuala, Nov 17-19, 2019. Cham: Sprin-ger, 2019: 271-279.
[25]	CHOI J, CHUN D, KIM H, et al. Gaussian YOLOv3: an accurate and fast object detector using localization uncer-tainty for autonomous driving[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 502-511.
[26]	YU F, CHEN H F, WANG X, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 2633-2642.
[27]	WANG C Y. LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of the 2020 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1571-1580.
[28]	DING X H, GUO Y C, DING G G, et al. ACNet: streng-thening the kernel skeletons for powerful CNN via asym-metric convolution blocks[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 1911-1920.
[29]	STERGIOU A, POPPE R, KALLIATAKIS G. Refining ac-tivation downsampling with SoftPool[J]. arXiv:2101.00440, 2021.
[30]	LIN T Y. DOLLÁR P, GIRSHICK R B, et al. Feature py-ramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[31]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 8759-8768.
[32]	BEHRENDT K, NOVAK L, BOTROS R. A deep learning approach to traffic lights: detection, tracking, and classifica-tion[C]// Proceedings of the 2017 IEEE International Con-ference on Robotics and Automation, Singapore, May 29-Jun 3, 2017. Piscataway: IEEE, 2017: 1370-1377.
[33]	CHENG Z W, WANG Z Y, HUANG H C, et al. Dense-ACSSD for end-to-end traffic scenes recognition[C]// Pro-ceedings of the 2019 IEEE Intelligent Vehicles Symposium, Paris, Jun 9-12, 2019. Piscataway: IEEE, 2019: 460-465.

编辑推荐 0

Metrics

阅读次数

全文

630

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	55	47	0	528

来源	本网站	其他网站

次数	592	38
比例	94%	6%

摘要

1105

最新录用	在线预览	正式出版

57	0	1048

来源	本网站	其他网站

次数	1104	1
比例	100%	0%

改进YOLOv5的交通灯实时检测鲁棒算法

Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 33

相关文章 2

编辑推荐 0

Metrics

[1]	林强，张淋均，谢艾伶，王维兰. 不安全越界行为的个性化实时检测[J]. 计算机科学与探索, 2020, 14(6): 1017-1027.
[2]	方峰，蔡志平，肇启佳，林加润，朱明. 使用Spark Streaming的自适应实时DDoS检测和防御技术[J]. 计算机科学与探索, 2016, 10(5): 601-611.