计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (2): 396-408.DOI: 10.3778/j.issn.1673-9418.2105091

• 图形·图像 • 上一篇    下一篇

特征增强的孪生网络高速跟踪算法

李虹瑾,彭力   

  1. 物联网技术应用教育部工程研究中心(江南大学 物联网工程学院),江苏 无锡 214122
  • 出版日期:2023-02-01 发布日期:2023-02-01

High-Speed Tracking Algorithm Based on Siamese Network with Enhanced Features

LI Hongjin, PENG Li   

  1. Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China
  • Online:2023-02-01 Published:2023-02-01

摘要: 近年来,实时的目标跟踪技术在许多复杂视觉系统中都发挥了重要的作用,跟踪算法作为其中的一个关键环节,不仅需要具备高精度还需要满足实时性。SiamFC算法在提出时由于可以较好地平衡精度与速度,受到了广泛的关注。但是SiamFC算法使用较浅的骨干网络,提取到的特征难以应对复杂多变的跟踪环境,容易导致跟踪漂移。为了同时提高算法的跟踪精度与速度,提出了一种特征增强的轻量级孪生网络高速跟踪算法。首先,使用改进后的轻量级网络ShuffleNetV2作为骨干网络提取特征,在减少模型参数量与计算量的同时大幅提升跟踪速度;其次,在孪生网络的模板分支末端嵌入通道与空间双重注意力来调整不同通道和空间位置的响应权重,突出对跟踪有益的特征;最后,采用分层特征融合策略,同时利用网络提取的深层语义特征与浅层结构特征,从多角度表征目标。在OTB100和VOT2018两个数据集上与当前一些优秀的跟踪算法进行对比实验,结果表明,所提算法在跟踪精度上有较大的优势,在困难场景下展现了较强的鲁棒性,同时算法在NVIDIA GTX1070下的速度可达110 FPS,相比SiamFC算法能够更好地兼顾跟踪精度与速度。

关键词: 目标跟踪, 孪生网络, 注意力机制, 特征融合

Abstract: In recent years, real-time object tracking technology has played an important role in many complex vision systems. As a key component, tracking algorithms have high accuracy and meet real-time requirements. SiamFC algorithm has received considerable attention because it can better balance accuracy and speed. However, the SiamFC algorithm uses a shallow backbone network, and the extracted features are difficult to cope with the complex and challenging tracking scenarios, which makes the tracker easily drift. In order to simultaneously imp-rove the tracking accuracy and speed, a high-speed tracking algorithm based on lightweight Siamese network with enhanced features is proposed. Firstly, the improved lightweight network ShuffleNetV2 is applied as the backbone network to extract features, which greatly improves the tracking speed while reducing the amount of model parameters and calculations. Secondly, a dual attention module including channel attention and spatial attention is embedded at the ends of the template branch within Siamese network, aiming at adjusting the response weights of different channels and spatial positions. Thus, the features that are useful for tracking are highlighted. Finally, the hierarchical feature fusion strategy is adopted, and the deep semantic features and shallow structure features extracted by the network are used to represent the target from multiple angles. Experimental results show that the proposed algorithm has greater advantages in tracking accuracy and stronger robustness in difficult scenarios in comparison with some current outstanding tracking algorithms on OTB100 and VOT2018 datasets. At the same time, the algo-rithm speed can reach 110 FPS under NVIDIA GTX1070, which can better balance tracking accuracy and speed in comparison with SiamFC algorithm.

Key words: object tracking, siamese network, attention mechanism, feature fusion