计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (9): 2161-2173.DOI: 10.3778/j.issn.1673-9418.2208034

• 图形·图像 • 上一篇    下一篇

时空模板更新的Transformer目标跟踪算法

汪强,卢先领   

  1. 1. 江南大学 轻工过程先进控制教育部重点实验室,江苏 无锡 214122
    2. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2023-09-01 发布日期:2023-09-01

Transformer Object Tracking Algorithm Based on Spatio-Temporal Template Update

WANG Qiang, LU Xianling   

  1. 1. Key Laboratory for Advanced Process Control for Light Industry of the Ministry of Education, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2023-09-01 Published:2023-09-01

摘要: 目前主流Transformer目标跟踪算法只使用Transformer网络进行特征增强和特征融合,忽略了Transformer网络的特征提取能力,并且跟踪过程中对尺度变化、形变等干扰因素缺少有效的模板更新策略。针对上述问题,提出基于时空模板更新和边界框提升的Transformer跟踪算法。首先采用改进后的Swin Transformer作为骨干网络,通过移位窗口进行自注意力计算和全局信息建模,增强骨干网络的特征提取能力;其次使用Transformer编码器-解码器结构融合模板区域和搜索区域信息,利用注意力机制建立特征关联以获取全局语义信息,同时跟踪过程中每隔固定帧根据置信度分数大小动态更新模板,用于调整模板外观状态;最后采用边界框提升模块精细化边界框的回归范围,提升算法的精度。在多个具有挑战性的数据集上与主流先进算法进行性能对比实验,在OTB2015数据集上成功率和精确率分别达到70.2%和91.0%,在GOT-10k数据集上平均重合度相较于基准算法TransT提升了0.02,在LaSOT数据集上成功率相较于基准算法TransT提升了0.024,并且能以42 FPS的跟踪速度进行实时跟踪。

关键词: 目标跟踪, Transformer网络, 时空模板, 边界框提升

Abstract: Currently, the mainstream Transformer tracking algorithm only uses Transformer for feature enhancement and feature fusion, ignoring the Transformer??s feature extraction ability, and lacks an effective template update strategy for disturbing factors such as scale change and deformation during the tracking process. Aiming at above problems, a Transformer tracking algorithm based on spatio-temporal template updating and bounding box refining is proposed. Firstly, the improved Swin Transformer is used as the backbone network, and self-attention calculation and global information modeling are performed by shifting windows to enhance the feature extraction ability of the backbone network. Secondly, the Transformer encoder-decoder structure is used to fuse the template area and search area infor-mation, and the attention mechanism is used to establish feature correlation. At the same time, the template is dynamically updated according to the size of confidence score every fixed frame to adjust the appearance state of the template during the tracking process. Finally, the bounding box refinement module is used to refine the regression range of the bounding box and improve the accuracy of the algorithm. Performance comparison experiments with mainstream advanced algorithms have been performed on multiple challenging datasets. The success rate and precision on the OTB2015 dataset respectively reach 70.2% and 91.0%. The average overlap on the GOT-10k dataset is improved 0.02 compared with benchmark algorithm TransT, the success rate on the LaSOT dataset is increased by 0.024 compared with the benchmark algorithm TransT, and it can also perform real-time tracking at a tracking speed of 42 FPS.

Key words: object tracking, Transformer network, spatio-temporal template, bounding box refinement