计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (11): 2676-2688.DOI: 10.3778/j.issn.1673-9418.2212065

• 图形·图像 • 上一篇    下一篇

融合注意力机制和上下文信息的实时交通标志检测算法

冯爱棋,吴小俊,徐天阳   

  1. 江南大学 人工智能与计算机学院 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
  • 出版日期:2023-11-01 发布日期:2023-11-01

Real-Time Traffic Sign Detection Algorithm Combining Attention Mechanism and Contextual Information

FENG Aiqi, WU Xiaojun, XU Tianyang   

  1. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 近年来交通标志检测任务得到了广泛的关注,然而现有方法常无法满足检测实时性要求,并且现有方法在检测小尺度交通标志时存在较多漏检情况,为此,提出融合注意力机制和上下文信息的实时交通标志检测算法。该算法以YOLOv5为基准模型,首先,在主干网络中嵌入空间注意力机制自适应地强化关键位置的特征,抑制干扰信息,提高主干网络的特征提取能力;其次,设计跨阶段局部窗口Transformer模块用于学习不同位置间信息的关联性,捕获交通标志周围丰富的上下文信息,有助于提高小尺度交通标志的检测精度;再次,提出轻量的特征融合网络对不同尺度的特征图进行融合,减少计算量的同时确保有效的特征融合;最后,在后处理阶段提出高斯加权融合方法修正检测框,进一步提高定位精度。在TT100K和DFG交通标志检测数据集上的实验表明,提出的算法有效地改善了小尺度交通标志的漏检情况,具有较高的准确性和实时性,可以满足实际场景的交通标志检测需求。

关键词: 交通标志检测, YOLOv5, 注意力机制, Transformer, 上下文信息

Abstract: Traffic sign detection has received widespread concern in recent years. However, existing methods often fail to meet the real-time detection requirements, and there are many cases of missing detection in small-scale traffic sign detection. To solve these problems, a real-time traffic sign detection algorithm combining attention mechanism and contextual information is proposed. Using YOLOv5 as the base model, firstly, spatial attention mechanism is embedded in the backbone to adaptively enhance the features of important positions and suppress interference information to improve the feature extraction capability of the backbone network. Secondly, the cross stage partial window Transformer module is designed to learn correlations of different locations and to capture rich contextual information around traffic signs, which is beneficial to improving the detection accuracy of small-scale traffic signs. Thirdly, the lightweight feature fusion network is proposed to fuse the feature maps of different scales, which can reduce the computational burden and ensure the effective feature fusion. Finally, in the post-processing stage,  Gaussian weighted fusion is used to amend the prediction boxes to improve the positioning accuracy. Experiments on TT100K and DFG traffic sign detection datasets show that the proposed method can effectively improve the missing detection of small-scale traffic signs, with higher accuracy and real-time performance, and can meet the requirements of traffic sign detection in actual scenarios.

Key words: traffic sign detection, YOLOv5, attention mechanism, Transformer, contextual information