计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (6): 1049-1061.DOI: 10.3778/j.issn.1673-9418.2007002

• 学术研究 • 上一篇    下一篇

时空关联自适应追踪目标特征学习

郭明哲,才子昕,王馨月,景丽萍,于剑   

  1. 北京交通大学 交通数据分析与挖掘北京市重点实验室,北京 100044
  • 出版日期:2021-06-01 发布日期:2021-06-03

Spatio-Temporal Correlation Based Adaptive Feature Learning of Tracking Object

GUO Mingzhe, CAI Zixin, WANG Xinyue, JING Liping, YU Jian   

  1. Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
  • Online:2021-06-01 Published:2021-06-03

摘要:

目标追踪是近年来视觉领域的一个研究难题,其核心任务是在视频序列中持续定位目标并使用边界框标注其位置。已有的追踪方法大多采用目标检测的思路,将视频序列按帧分开对目标进行单独检测。这种策略尽管充分利用了当前帧信息,却忽略了帧与帧之间的时空关联信息,而这些信息是适应目标外观变化并完整检测目标的关键。为解决这一问题,提出了时空关联的自适应追踪目标特征学习框架时空孪生网络(STSiam),该模型利用视频序列间时空关联信息,通过目标定位和目标表征两个阶段,对目标进行准确定位和实时追踪。目标定位阶段,STSiam自适应地捕捉目标及其周围的变化特征,更新目标匹配模板,确保其尽量免受外观变化影响;目标表征阶段,STSiam关注不同帧对应区域之间的空间关联信息,利用目标定位锁定区域并学习目标边界框修正参数,确保边界框尽量贴合目标。该模型网络架构基于离线训练,在线追踪时无需更新模型参数,确保其实时追踪速度。在广泛使用的OTB2015、VOT2016、VOT2018和LaSOT数据集上进行了一系列实验验证,相较于已有方法,STSiam在准确率、鲁棒性和速度方面均取得领先性能。

关键词: 时空关联, 特征, 追踪, 目标定位, 目标表征

Abstract:

Object tracking has been a difficult problem in the field of vision in recent years. The core task is to continuously locate an object in video sequences and mark its location with bounding boxes. Most of the existing tracking methods use the idea of object detection, and separate the video sequence by frame to detect the target separately. Although this strategy makes full use of the current frame information, it ignores the spatio-temporal correlation information among frames. However, the spatio-temporal correlation information is the key of adapting to the change of the target??s appearance and fully representing the target. To solve this problem, this paper proposes a spatio-temporal siamese network (STSiam) based on spatio-temporal correlation. STSiam uses the spatio-temporal correlation information for target locating and real-time tracking in two stages: object localization and object repre-sentation. In the stage of object localization, STSiam adaptively captures the features of the target and its surroun-ding area, and updates the target matching template to ensure that it is not affected by appearance changes. In the stage of object representation, STSiam pays attention to the spatial correlation information between corresponding regions in different frames. By using the object localization, STSiam locates the target area and learns the target bounding box correction parameters to ensure that the bounding box fits the target as closely as possible. The model??s network architecture is based on offline training, and it is no need to update model parameters during online tracking to ensure its real-time tracking speed. Extensive experiments on visual tracking benchmarks including OTB2015, VOT2016, VOT2018 and LaSOT demonstrate that STSiam achieves state-of-the-art performance in terms of accu-racy, robustness and speed compared with existing methods.

Key words: spatio-temporal correlation, feature, tracking, object localization, object representation