Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (6): 1049-1061.DOI: 10.3778/j.issn.1673-9418.2007002

• Science Researches • Previous Articles     Next Articles

Spatio-Temporal Correlation Based Adaptive Feature Learning of Tracking Object

GUO Mingzhe, CAI Zixin, WANG Xinyue, JING Liping, YU Jian   

  1. Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
  • Online:2021-06-01 Published:2021-06-03



  1. 北京交通大学 交通数据分析与挖掘北京市重点实验室,北京 100044


Object tracking has been a difficult problem in the field of vision in recent years. The core task is to continuously locate an object in video sequences and mark its location with bounding boxes. Most of the existing tracking methods use the idea of object detection, and separate the video sequence by frame to detect the target separately. Although this strategy makes full use of the current frame information, it ignores the spatio-temporal correlation information among frames. However, the spatio-temporal correlation information is the key of adapting to the change of the target??s appearance and fully representing the target. To solve this problem, this paper proposes a spatio-temporal siamese network (STSiam) based on spatio-temporal correlation. STSiam uses the spatio-temporal correlation information for target locating and real-time tracking in two stages: object localization and object repre-sentation. In the stage of object localization, STSiam adaptively captures the features of the target and its surroun-ding area, and updates the target matching template to ensure that it is not affected by appearance changes. In the stage of object representation, STSiam pays attention to the spatial correlation information between corresponding regions in different frames. By using the object localization, STSiam locates the target area and learns the target bounding box correction parameters to ensure that the bounding box fits the target as closely as possible. The model??s network architecture is based on offline training, and it is no need to update model parameters during online tracking to ensure its real-time tracking speed. Extensive experiments on visual tracking benchmarks including OTB2015, VOT2016, VOT2018 and LaSOT demonstrate that STSiam achieves state-of-the-art performance in terms of accu-racy, robustness and speed compared with existing methods.

Key words: spatio-temporal correlation, feature, tracking, object localization, object representation



关键词: 时空关联, 特征, 追踪, 目标定位, 目标表征