Traffic Sign Semantic Segmentation Based on Convolutional Neural Network

doi:10.3778/j.issn.1673-9418.2005060

Abstract

Abstract:

Image semantic segmentation is a necessary part of modern autonomous driving systems, because real-time and accurate capture of road condition information is the key to navigation and action planning. Traffic signs are important road condition information. The traffic sign semantic algorithm with stable performance, high real-time performance and accuracy that can meet the application needs is the basis for the realization of active safe driving systems and automatic driving systems. First, based on the analysis of actual application needs, the GTSDB database is selected as the original data, and a traffic sign data set that can comprehensively evaluate the perfor-mance of the semantic segmentation algorithm is designed. Then, based on the classical semantic segmentation network with stable performance U-Net, the D-Unet (D means dilated convolution), a deep neural network structure is proposed with better segmentation performance and higher real-time performance for small targets such as traffic signs. This method uses fewer pooling layers to retain more image information, and uses dilated convolution instead of conventional convolution to expand the receptive field of convolution, better overall planning global information. Finally, tested on the data set designed in this paper, compared with FCN-8s, SegNet, U-Net and other image segmentation network models, the mean intersection over union (MIoU) of the model is increased by about 11.9 percentage points, 6.09 percentage points and 3.71 percentage points, and the parameter amount is only 4.94%, 22.5% and 85.5% of the other three network models.

Key words: road traffic sign, deep learning, semantic segmentation；dilated convolution

摘要：

图像语义分割是现代自动驾驶系统的一个必要部分，因为实时准确地捕获路况信息是导航和动作规划的关键。交通标志是重要的路况信息，性能稳定、实时性较高并且精度可达到应用需要的交通标志语义算法，是实现主动安全驾驶系统和自动驾驶系统的基础。首先，在分析实际应用需要的基础上，选择GTSDB数据库作为原始数据，设计了可综合评估语义分割算法性能的交通标志数据集。然后，基于性能稳定的经典语义分割网络U-Net，提出针对交通标志等小目标的分割性能更优且实时性更高的深度神经网络结构D-Unet（“D”表示dilated convolution）。该方法采用更少的池化层，从而保留更多的图像信息，同时采用扩张卷积代替常规卷积以扩大卷积感受野，更好地统筹全局信息。最后，在设计的数据集上进行了测试，与FCN-8s、SegNet、U-Net等图像分割网络模型相比，改进后的模型均交并比（MIoU）分别提高了约11.9个百分点、6.09个百分点和3.71个百分点，参数量仅有其他三种网络模型的4.94%、22.5%和85.5%。

关键词: 道路交通标志, 深度学习, 语义分割, 扩张卷积

MA Yu, ZHANG Liguo, DU Huimin, MAO Zhili. Traffic Sign Semantic Segmentation Based on Convolutional Neural Network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1114-1121.

马宇, 张丽果, 杜慧敏, 毛智礼. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121.

References

[1] LI J, WANG Z F. Real-time traffic sign recognition based on efficient CNNs in the wild[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(3): 975-984.
[2] LI X D, ZHANG M, XIE Z P, et al. A fast traffic sign detection algorithm based on three-scale nested residual structures[J]. Journal of Computer Research and Development, 2020, 57(5): 1022-1036.
李旭东, 张明, 谢志鹏, 等. 基于三尺度嵌套残差结构的交通标志快速检测算法[J]. 计算机研究与发展, 2020, 57(5): 1022-1036.
[3] KHANAL B, ALI S, SIDIBé D. Robust road signs segm-entation in color images[C]//Proceedings of the 2012 Inter-national Conference on Computer Vision Theory and Appli-cations, Rome, Feb 24-26, 2012. SciTePress, 2012: 307-310.
[4] CHEN H B, WANG Q, XU X R, et al. Line detection in traffic sign image based on improved Hough transform[J]. Optics and Precision Engineering, 2009, 17(5): 1111-1118.
陈洪波, 王强, 徐晓蓉, 等. 用改进的Hough变换检测交通标志图像的直线特征[J]. 光学精密工程, 2009, 17(5): 1111-1118.
[5] YU X C, ZHOU J, CHEN D R. Urban road traffic speed limit sign recognition method based on multi-feature fusion in color space and template matching color[J]. International Journal of Advancements in Computing Technology, 2012, 4(12): 222-232.
[6] LONG J, SHELHAMER E, DARRELL T, et al. Fully con-volutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-13, 2015. Washington: IEEE Com-puter Society, 2015: 3434-3440.
[7] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv:1511.07122, 2016.
[8] ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1529-1537.
[9] BADRINARAYANAN V, KENDALL A, CIPOLLA R, et al. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[10] RONNEBERGER O, FISCHER P, BROX T. U-Net: con-volutional networks for biomedical image segmentation[C]//LNCS 9351: Proceedings of the 18th International Con-ference on Medical Image Computing and Computer-Assisted Intervention, Munich, Oct 5-9, 2015. Berlin, Heidelberg: Springer, 2015: 234-241.
[11] LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5168-5177.
[12] LIU W, RABINOVICH A, BERG A C. ParseNet: looking wider to see better[C]//Proceedings of the 2015 IEEE Con-ference on Computer Vision and Pattern Recognition, Boston, Jun 8-10, 2015. Washington: IEEE Computer Society, 2015: 27-37.
[13] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[14] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6230-6239.
[15] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Berlin, Heidelberg: Springer, 2018: 833-851.
[16] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv: 1706.05587, 2017.
[17] HOUBEN S, STALLKAMP J, SALMEN J, et al. Detection of traffic signs in real-world images: the German traffic sign detection benchmark[C]//Proceedings of the 2013 International Joint Conference on Neural Networks, Texas, Aug 4-9, 2013. Piscataway: IEEE, 2013: 1-8.