Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm

doi:10.3778/j.issn.1673-9418.2105033

Abstract

Abstract:

Traffic light detection algorithm, a critical procedure for realization of automatic driving, is directly related to the driving safety of intelligent vehicles. However, due to the small size of traffic lights and complicated environment, the algorithm research meets plenty of difficulties. This paper puts forward a traffic light detection algorithm based on optimized YOLOv5. Firstly, it uses a visible label ratio to determine the model input. Secondly, the ACBlock structure is introduced to increase the feature extraction ability of the backbone network; the SoftPool is designed to reduce the sample loss of the backbone network and the DSConv convolution kernel is used to reduce the model parameters. Finally, a memory feature fusion network is designed to efficiently utilize high level semantic information and low level features. As a result, the improvement of model input and backbone network directly improves the feature extraction ability of the model in complex environment; the improvement of feature fusion network enables the model to make full use of feature information and increase the accuracy of target positioning and boundary regression. Experimental results show that, the proposed algorithm achieves 74.3% AP and 111 frame/s detection speed on BDD100K, which is 11.0 percentage points higher than the AP of YOLOv5. In Bosch data set, 84.4% AP and 126 frame/s detection speed are obtained, which is 9.3 percentage points higher than the AP of YOLOv5. The robustness test results show that the proposed algorithm has significantly improved the detection ability of tar-gets in a variety of complex environments, and the robustness is increased to achieve high-precision real-time detection.

Key words: traffic light detection, YOLOv5, memory feature fusion network, BDD100K, real-time detection

摘要：

交通灯检测算法作为自动驾驶任务中的一个重要环节,直接关系到智能汽车的行车安全。因为交通灯尺度小且环境复杂,给算法研究带来了困难。针对交通检测存在的痛点,提出改进YOLOv5的交通灯检测算法。首先使用可见标签比确定模型输入;然后引入ACBlock结构增加主干网络的特征提取能力,设计SoftPool减少主干网络的采样信息损失,使用DSConv卷积核减少模型参数;最后设计了记忆性特征融合网络,高效利用了高级语义信息和底层特征。对模型输入和主干网络的改进,直接提高模型在复杂环境下对特征的提取能力;对特征融合网络的改进,使模型能够充分利用特征信息,增加对目标定位和边界回归的精准度。实验结果表明,改进后的方法在BDD100K数据集上取得了74.3%的AP和111 frame/s的检测速度,比YOLOv5提高11.0个百分点的AP;在Bosch数据集上取得了84.4%的AP和126 frame/s的检测速度,比YOLOv5提高9.3个百分点的AP。鲁棒性测试结果表明,改进后的模型在各种复杂环境中对目标的检测能力都有显著提升,鲁棒性增加,做到了高精度实时检测。

关键词: 交通灯检测, YOLOv5, 记忆性特征融合网络, BDD100K, 实时检测

CLC Number:

TP391.4

QIAN Wu, WANG Guozhong, LI Guoping. Improved YOLOv5 Traffic Light Real-Time Detection Robust Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 231-241.

钱伍, 王国中, 李国平. 改进YOLOv5的交通灯实时检测鲁棒算法[J]. 计算机科学与探索, 2022, 16(1): 231-241.

Figures/Tables 16

Fig.1 YOLOv5 network structure

Fig.2 I-YOLOv5 network structure

Table 1 Relationship between visible label ratio and model performance

$s$	Visible label ratio/%	Input size	YOLOv5s			I-YOLOv5s
$s$	Visible label ratio/%	Input size	AP/%	Time/ms	GPU/GB	AP/%	Time/ms	GPU/GB
10	88.61	$1024 × 1024$	66.3	123	7.69	73.0	98	17.9
11	82.57	$960 × 960$	65.6	123	7.65	73.0	99	17.7
12	75.77	$864 × 864$	64.8	123	7.60	72.8	104	14.5
13	68.77	$800 × 800$	64.0	122	7.10	72.3	111	12.7
14	62.35	$736 × 736$	63.1	123	5.50	71.6	111	7.77
16	51.26	$640 × 640$	61.6	132	5.24	69.9	122	7.38

Table 1 Relationship between visible label ratio and model performance

$s$	Visible label ratio/%	Input size	YOLOv5s			I-YOLOv5s
$s$	Visible label ratio/%	Input size	AP/%	Time/ms	GPU/GB	AP/%	Time/ms	GPU/GB
10	88.61	$1024 × 1024$	66.3	123	7.69	73.0	98	17.9
11	82.57	$960 × 960$	65.6	123	7.65	73.0	99	17.7
12	75.77	$864 × 864$	64.8	123	7.60	72.8	104	14.5
13	68.77	$800 × 800$	64.0	122	7.10	72.3	111	12.7
14	62.35	$736 × 736$	63.1	123	5.50	71.6	111	7.77
16	51.26	$640 × 640$	61.6	132	5.24	69.9	122	7.38

Fig.3 ACBlock structure

Fig.4 SoftPool down sampling process

Fig.5 Regular convolution compared with DSConv

Fig.6 PANet and proposed feature fusion network structure

Fig.7 Heatmap of fused features

Table 2 Average IOU of predict and ground truth boxes

Model	Average IOU
YOLOv5+PANet	0.528
YOLOv5+Our FPN	0.591

Fig.8 Model training process

Table 3 Ablation experiments based on YOLOv5l

ACBlock	SoftPool	DSConv	Our FPN	AP/%	FLOPS/10⁹
—	—	—	—	62.9	117
√	—	—	—	68.3	198
—	√	—	—	65.7	115
—	—	√	—	62.7	112
—	—	—	√	68.5	92
√	√	—	—	69.9	195
√	√	√	—	69.6	191
√	√	√	√	73.2	166

Table 4 Test results of different models on BDDTL

Model	Input size	AP/%	检测速度/(frame/s)
Dense-ACSSD^[33]	448×448	10.27	35
YOLOv3	416×416	37.67	27
Gaussian YOLOv3	416×416	46.78	30
YOLOv5s	640×640	61.60	132
YOLOv5m	640×640	62.80	100
YOLOv5l	640×640	62.90	83
YOLOv5x	640×640	63.30	55
EfficientDet-D0	512×512	14.50	33
EfficientDet-D1	640×640	28.90	25
EfficientDet-D2	736×736	45.50	24
I-YOLOv5s	800×800	72.30(+9.0)	111
I-YOLOv5m	800×800	73.60(+10.3)	76
I-YOLOv5l	800×800	73.90(+10.6)	62
I-YOLOv5x	800×800	74.30(+11.0)	40

Table 5 Test results of different models on Bosch

Model	Input size	AP/%	检测速度/(frame/s)
YOLOv5s	640×640	75.1	130
YOLOv5m	640×640	67.6	100
YOLOv5l	640×640	67.9	83
YOLOv5x	640×640	74.2	52
I-YOLOv5s	800×800	82.8	126
I-YOLOv5m	800×800	82.8	91
I-YOLOv5l	800×800	82.9	71
I-YOLOv5x	800×800	84.4(+9.3)	46

Fig.9 Model detection effect

Table 6 Improved YOLOv5 and YOLOv5 robustness test results %

Condition		YOLOv5s	YOLOv5m	YOLOv5l	YOLOv5x	I-YOLOv5s	I-YOLOv5m	I-YOLOv5l	I-YOLOv5x
size	small	59.3	60.4	60.3	60.7	69.9	71.2	71.5	71.9(+11.2)
	medium	76.3	78.7	79.0	79.3	85.6	86.6	87.2(+7.9)	87.2
	large	34.2	48.6	33.5	37.0	56.6	68.0(+19.4)	53.5	66.1
time	dawn/dusk	60.1	62.6	62.3	62.8	73.5	74.9	75.5	75.8(+13.0)
	daytime	63.5	64.9	65.0	65.8	76.1	77.9	78.5	78.6(+12.8)
	night	58.9	59.7	59.8	59.5	66.8	67.4	67.5	68.1(+8.3)
scene	city street	62.3	63.5	63.7	63.9	73.0	74.2	74.5(+10.6)	74.9
	gas station	71.6	54.9	56.9	54.3	58.7	72.1	77.3	80.4(+8.8)
	highway	56.7	58.0	57.7	58.1	67.9	68.4	69.5	69.6(+11.5)
	parking lot	43.4	48.4	41.7	40.3	56.3	61.8	69.5(+21.1)	64.4
	residential	60.6	62.3	62.4	63.4	70.6	73.4	74.1	74.8(+11.4)
	tunnel	68.6	47.9	83.9	90.4	77.9	82.2(-8.2)	71.3	78.2
weather	clear	61.0	62.0	61.9	62.4	70.6	71.6	71.6	72.1(+9.7)
	foggy	44.3	49.2	40.2	49.8	63.3	75.2(+25.4)	63.5	59.4
	overcast	65.1	67.0	66.2	67.6	77.6	79.4	79.3	80.2(+12.6)
	partly cloud	64.4	65.3	66.4	66.5	77.1	78.0	79.2(+12.7)	78.6
	rainy	58.3	60.2	60.8	59.3	69.6	70.9	71.7(+10.9)	71.5
	snowy	61.6	62.6	63.8	62.5	71.4	72.6	73.9(+10.1)	73.9

Table 7 Robust ablation experiment based on YOLOv5l

ACBlock	SoftPool	DSConv	Our FPN	$mA P size$ /%	$mA P time$ /%	$mA P scene$ /%	$mA P weather$ /%	A mAP I
—	—	—	—	57.6	62.4	61.1	59.9
√	—	—	—	62.8	67.5	64.3	65.2	4.7
—	√	—	—	60.5	65.1	63.6	62.9	2.8
—	—	√	—	57.2	62.1	61.0	59.6	-0.2
—	—	—	√	62.7	68.0	64.6	65.8	5.0
√	√	—	—	64.6	69.4	65.5	67.2	6.4
√	√	√	—	64.6	69.2	65.0	67.2	6.3
√	√	√	√	69.3	72.3	71.3	71.0	10.8

Table 7 Robust ablation experiment based on YOLOv5l

ACBlock	SoftPool	DSConv	Our FPN	$mA P size$ /%	$mA P time$ /%	$mA P scene$ /%	$mA P weather$ /%	A mAP I
—	—	—	—	57.6	62.4	61.1	59.9
√	—	—	—	62.8	67.5	64.3	65.2	4.7
—	√	—	—	60.5	65.1	63.6	62.9	2.8
—	—	√	—	57.2	62.1	61.0	59.6	-0.2
—	—	—	√	62.7	68.0	64.6	65.8	5.0
√	√	—	—	64.6	69.4	65.5	67.2	6.4
√	√	√	—	64.6	69.2	65.0	67.2	6.3
√	√	√	√	69.3	72.3	71.3	71.0	10.8

References 33

[1]	KANOPOULOS N, VASANTHAVADA N, BAKER R L. Design of an image edge detection filter using the Sobel operator[J]. IEEE Journal of Solid-State Circuits, 1988, 23(2):358-367. DOI URL
[2]	ILLINGWORTH J, KITTLER J. The adaptive Hough trans-form[J]. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 1987(5):690-698.
[3]	DUAN K B, KEERTHI S S. Which is the best multiclass SVM method? An empirical study[C]// LNCS 3541: Procee-dings of the International Workshop on Multiple Classifier Systems, Seaside, Jun 13-15, 2005. Berlin, Heidelberg: Sp-ringer, 2005: 278-285.
[4]	OMACHI M, OMACHI S. Traffic light detection with color and edge information[C]// Proceedings of the 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, Aug 8-11, 2009. Piscata-way: IEEE, 2009: 284-287.
[5]	LI Y, CAI Z, GU M, et al. Notice of retraction: traffic lights recognition based on morphology filtering and statistical classification[C]// Proceedings of the 2011 7th Interna-tional Conference on Natural Computation, Shanghai, Jul 26-28, 2011. Washington: IEEE Computer Society, 2011: 1700-1704.
[6]	SERMANET P, EIGEN D, ZHANG X, et al. Overfeat: inte-grated recognition, localization and detection using convo-lutional networks[J]. arXiv:1312.6229, 2013.
[7]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[8]	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector[J]. arXiv:1712.00960, 2017.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[10]	BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[11]	REDMON J, FARHADI A. YOLOv3: an incremental imp-rovement[J]. arXiv:1804.02767, 2018.
[12]	Ultralytics. YOLOv5[EB/OL]. [2021-03-14]. https://github.com/ultralytics/yolov5.
[13]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Procee-dings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Was-hington: IEEE Computer Society, 2016: 779-788.
[14]	REDMON J, FARHADI A. YOLO9000: better, faster, stron-ger[C]// Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 7263-7271.
[15]	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 10781-10790.
[16]	DAI J, LI Y, HE K, et al. R-FCN: object detection via region-based fully convolutional networks[J]. arXiv:1605.06409, 2016.
[17]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and se-mantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 20-23, 2014. Washington: IEEE Computer Society, 2014: 580-587.
[18]	GIRSHICK R B. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[19]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. arXiv:1506.01497, 2015.
[20]	HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. DOI URL
[21]	MANANA M, TU C, OWOLAWI P A. Preprocessed faster RCNN for vehicle detection[C]// Proceedings of the 2018 International Conference on Intelligent and Innovative Com-puting Applications, Plaine Magnien, Dec 6-7, 2018. Washing-ton: IEEE Computer Society, 2018: 1-4.
[22]	WANG C, ZHANG G W, ZHOU W, et al. Traffic lights detection based on deep learning feature[C]// Proceedings of the 2019 International Conference on Internet of Things as a Service, Zurich, Nov 23-24, 2019. Cham: Springer, 2019: 382-396.
[23]	LIU J, ZHANG D. Research on vehicle object detection algorithm based on improved YOLOv3 algorithm[J]. Pro-ceedings of the Journal of Physics: Conference Series, 2020, 1575(1):012150.
[24]	THIPSANTHIA P, CHAMCHONG R, SONGRAM P. Road sign detection and recognition of Thai traffic based on YOLOv3[C]// LNCS 11909: Proceedings of the 13th Inter-national Conference on Multi-disciplinary Trends in Arti-ficial Intelligence, Kuala, Nov 17-19, 2019. Cham: Sprin-ger, 2019: 271-279.
[25]	CHOI J, CHUN D, KIM H, et al. Gaussian YOLOv3: an accurate and fast object detector using localization uncer-tainty for autonomous driving[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 502-511.
[26]	YU F, CHEN H F, WANG X, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 2633-2642.
[27]	WANG C Y. LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]// Proceedings of the 2020 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1571-1580.
[28]	DING X H, GUO Y C, DING G G, et al. ACNet: streng-thening the kernel skeletons for powerful CNN via asym-metric convolution blocks[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 1911-1920.
[29]	STERGIOU A, POPPE R, KALLIATAKIS G. Refining ac-tivation downsampling with SoftPool[J]. arXiv:2101.00440, 2021.
[30]	LIN T Y. DOLLÁR P, GIRSHICK R B, et al. Feature py-ramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[31]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2018: 8759-8768.
[32]	BEHRENDT K, NOVAK L, BOTROS R. A deep learning approach to traffic lights: detection, tracking, and classifica-tion[C]// Proceedings of the 2017 IEEE International Con-ference on Robotics and Automation, Singapore, May 29-Jun 3, 2017. Piscataway: IEEE, 2017: 1370-1377.
[33]	CHENG Z W, WANG Z Y, HUANG H C, et al. Dense-ACSSD for end-to-end traffic scenes recognition[C]// Pro-ceedings of the 2019 IEEE Intelligent Vehicles Symposium, Paris, Jun 9-12, 2019. Piscataway: IEEE, 2019: 460-465.