融合注意力机制和上下文信息的实时交通标志检测算法

doi:10.3778/j.issn.1673-9418.2212065

摘要/Abstract

摘要： 近年来交通标志检测任务得到了广泛的关注，然而现有方法常无法满足检测实时性要求，并且现有方法在检测小尺度交通标志时存在较多漏检情况，为此，提出融合注意力机制和上下文信息的实时交通标志检测算法。该算法以YOLOv5为基准模型，首先，在主干网络中嵌入空间注意力机制自适应地强化关键位置的特征，抑制干扰信息，提高主干网络的特征提取能力；其次，设计跨阶段局部窗口Transformer模块用于学习不同位置间信息的关联性，捕获交通标志周围丰富的上下文信息，有助于提高小尺度交通标志的检测精度；再次，提出轻量的特征融合网络对不同尺度的特征图进行融合，减少计算量的同时确保有效的特征融合；最后，在后处理阶段提出高斯加权融合方法修正检测框，进一步提高定位精度。在TT100K和DFG交通标志检测数据集上的实验表明，提出的算法有效地改善了小尺度交通标志的漏检情况，具有较高的准确性和实时性，可以满足实际场景的交通标志检测需求。

关键词: 交通标志检测, YOLOv5, 注意力机制, Transformer, 上下文信息

Abstract: Traffic sign detection has received widespread concern in recent years. However, existing methods often fail to meet the real-time detection requirements, and there are many cases of missing detection in small-scale traffic sign detection. To solve these problems, a real-time traffic sign detection algorithm combining attention mechanism and contextual information is proposed. Using YOLOv5 as the base model, firstly, spatial attention mechanism is embedded in the backbone to adaptively enhance the features of important positions and suppress interference information to improve the feature extraction capability of the backbone network. Secondly, the cross stage partial window Transformer module is designed to learn correlations of different locations and to capture rich contextual information around traffic signs, which is beneficial to improving the detection accuracy of small-scale traffic signs. Thirdly, the lightweight feature fusion network is proposed to fuse the feature maps of different scales, which can reduce the computational burden and ensure the effective feature fusion. Finally, in the post-processing stage, Gaussian weighted fusion is used to amend the prediction boxes to improve the positioning accuracy. Experiments on TT100K and DFG traffic sign detection datasets show that the proposed method can effectively improve the missing detection of small-scale traffic signs, with higher accuracy and real-time performance, and can meet the requirements of traffic sign detection in actual scenarios.

Key words: traffic sign detection, YOLOv5, attention mechanism, Transformer, contextual information

冯爱棋, 吴小俊, 徐天阳. 融合注意力机制和上下文信息的实时交通标志检测算法[J]. 计算机科学与探索, 2023, 17(11): 2676-2688.

FENG Aiqi, WU Xiaojun, XU Tianyang. Real-Time Traffic Sign Detection Algorithm Combining Attention Mechanism and Contextual Information[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(11): 2676-2688.

参考文献

[1] CREUSEN I M, WIJNHOVEN R G, HERBSCHLEB E, et al. Color exploitation in hog-based traffic sign detection[C]//Proceedings of the 2010 International Conference on Image Processing, Hong Kong, China, Sep 26-29, 2010. Piscataway: IEEE, 2010: 2669-2672.
[2] LE T T, TRAN S T, MITA S, et al. Real time traffic sign detect-ion using color and shape-based features[C]//LNCS 5991: Proceedings of the 2nd International Conference on Intel-ligent Information and Database Systems, Hue City, Mar 24, 2010. Berlin, Heidelberg: Springer, 2010: 268-278.
[3] 张静, 何明一, 戴玉超, 等. 多特征融合的圆形交通标志检测[J]. 模式识别与人工智能, 2011, 24(2): 226-232.
ZHANG J, HE M Y, DAI Y C, et al. Multi-feature fusion based circular traffic sign detection[J]. Pattern Recognition and Artificial Intelligence, 2011, 24(2): 226-232.
[4] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE/CVF Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Piscataway: IEEE, 2005: 886-893.
[5] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[6] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic seg-mentation[C]//Proceedings of the 2014 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Piscataway: IEEE, 2014: 580-587.
[7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: to-wards real-time object detection with region proposal net-works[J]. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2017, 39(6): 1137-1149.
[8] CAI Z W, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1483-1498.
[9] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 2020, 42(2): 386-397.
[10] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[11] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[12] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 779-788.
[13] REDMON J, FARHADI A. YOLOv3: an incremental im-provement[J]. arXiv:1804.02767, 2018.
[14] Ultralytics. YOLOv5[EB/OL]. [2022-12-05]. https://github.com/ultralytics/yolov5/releases/tag/v6.0.
[15] LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[J]. arXiv:2209.02976, 2022.
[16] LI J N, LIANG X D, WEI Y C, et al. Perceptual generative adversarial networks for small object detection[C]//Procee-dings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washing-ton: IEEE Computer Society, 2017: 1951-1959.
[17] 赵鹏飞, 谢林柏, 彭力. 融合注意力机制的深层次小目标检测算法[J]. 计算机科学与探索, 2022, 16(4): 927-937.
ZHAO P F, XIE L B, PENG L. Deep small object detection algorithm integrating attention mechanism[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937.
[18] 尚晓可, 安南, 尚敬捷, 等. 结合视觉显著性与注意力机制的低光照图像增强[J]. 模式识别与人工智能, 2022, 35(7): 602-613.
SHANG X K, AN N, SHANG J J, et al. Combining visual saliency and attention mechanism for low-light image en-hancement[J]. Pattern Recognition and Artificial Intelligence, 2022, 35(7): 602-613.
[19] ZHANG R, WU Y, GOU W, et al. RS-Lane: a robust lane detection method based on ResNeSt and self-attention dis-tillation for challenging traffic situations[J]. Journal of Advanced Transportation, 2021: 7544355.
[20] ZHANG J, HUI L, LU J F, et al. Attention-based neural net-work for traffic sign detection[C]//Proceedings of the 24th International Conference on Pattern Recognition, Beijing, Aug 20-24, 2018. Washington: IEEE Computer Society, 2018: 1839-1844.
[21] WANG J F, CHEN Y, GAO M Y, et al. Improved YOLOv5 network for real-time multi-scale traffic sign detection[J]. arXiv:2112.08782, 2021.
[22] 郭璠, 张泳祥, 唐琎, 等. YOLOv3-A: 基于注意力机制的交通标志检测网络[J]. 通信学报, 2021, 42(1): 87-99.
GUO F, ZHANG Y X, TANG J, et al. YOLOv3-A: a traffic sign detection network based on attention mechanism[J]. Journal on Communications, 2021, 42(1): 87-99.
[23] LIN T Y, DOLLáR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[24] OLIVA A, TORRALBA A. The role of context in object rec-ognition[J]. Trends in Cognitive Sciences, 2007, 11(12): 520-527.
[25] HU P, RAMANAN D. Finding tiny faces[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1522-1530.
[26] 马宇, 张丽国, 杜慧敏, 等. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121.
MA Y, ZHANG L G, DU H M, et al. Traffic sign semantic segmentation based on convolutional neural network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1114-1121.
[27] CHEN J, JIA K, CHEN W, et al. A real-time and high-pre-cision method for small traffic-signs recognition[J]. Neural Computing and Applications, 2022, 34(3): 2233-2245.
[28] YUAN Y, XIONG Z T, WANG Q. VSSA-NET: vertical spa-tial sequence attention network for traffic sign detection[J]. IEEE Transactions on Image Processing, 2019, 28(7): 3423-3434.
[29] LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Compu-ter Society, 2018: 8759-8768.
[30] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recog-nition at scale[J]. arXiv:2010.11929, 2020.
[31] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 9992-10002.
[32] HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recogni-tion, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1577-1586.
[33] LI Z S, CHEN M M, HE Y F, et al. An efficient framework for detection and recognition of numerical traffic signs[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, May 22-27, 2022. Piscataway: IEEE, 2022: 2235-2239.
[34] ZHU Z, LIANG D, ZHANG S H, et al. Traffic-sign detec-tion and classification in the wild[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2110-2118.
[35] NOH J, BAE W, LEE W, et al. Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 9724-9733.
[36] DENG C F, WANG M M, LIU L, et al. Extended feature pyra-mid network for small object detection[J]. IEEE Transactions on Multimedia, 2021, 24: 1968-1979.
[37] TABERNIK D, SKO?AJ D. Deep learning for large-scale traffic-sign detection and recognition[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(4): 1427-1440.
[38] SUN P Z, ZHANG R F, JIANG Y, et al. Sparse R-CNN: end-to-end object detection with learnable proposals[C]//Procee-dings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 14454-14463.
[39] CHEN K, WANG J Q, PANG J M, et al. MMDetection: open MMLab detection toolbox and benchmark[J]. arXiv:1906.07155, 2019.
[40] GE Z, LIU S T, WANG F, et al. YOLOx: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.