深度在线多目标跟踪算法综述

doi:10.3778/j.issn.1673-9418.2204041

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (12): 2718-2733.DOI: 10.3778/j.issn.1673-9418.2204041

深度在线多目标跟踪算法综述

刘文强, 裘杭萍(), 李航, 杨利, 李阳, 苗壮, 李一, 赵昕昕

陆军工程大学指挥控制工程学院，南京 210007

收稿日期:2022-04-13 修回日期:2022-07-13 出版日期:2022-12-01 发布日期:2022-12-16
通讯作者: +E-mail: 13952004682@139.com
作者简介:刘文强（1996—），男，江西上饶人，硕士研究生，主要研究方向为机器视觉、多目标跟踪。
裘杭萍（1965—），女，浙江杭州人，博士，教授，主要研究方向为系统工程、信息检索。
李航（1983—），男，江苏南京人，博士，工程师，主要研究方向为计算机视觉、信息融合。
杨利（1981—），男，河北肃宁人，硕士研究生，主要研究方向为机器视觉、目标检测。
李阳（1984—），男，河北廊坊人，博士，副教授，主要研究方向为人工智能、深度学习。
苗壮（1976—），男，辽宁沈阳人，博士，副教授，主要研究方向为图像视频处理。
李一（1998—），男，河北衡水人，硕士研究生，主要研究方向为人工智能、图像融合。
赵昕昕（1996—），女，河南平顶山人，硕士研究生，主要研究方向为机器学习、图像检索。
基金资助:
江苏省自然科学基金(BK20200581);中国博士后科学基金(2020M683754)

Survey of Deep Online Multi-object Tracking Algorithms

LIU Wenqiang, QIU Hangping(), LI Hang, YANG Li, LI Yang, MIAO Zhuang, LI Yi, ZHAO Xinxin

School of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China

Received:2022-04-13 Revised:2022-07-13 Online:2022-12-01 Published:2022-12-16
About author:LIU Wenqiang, born in 1996, M.S. candidate. His research interests include machine vision and multi-object tracking.
QIU Hangping, born in 1965, Ph.D., professor. Her research interests include systems engineering and information retrieval.
LI Hang, born in 1983, Ph.D., engineer. His research interests include computer vision and information fusion.
YANG Li, born in 1981, M.S. candidate. His research interests include machine vision and object detection.
LI Yang, born in 1984, Ph.D., associate professor. His research interests include artificial intelligence and deep learning.
MIAO Zhuang, born in 1976, Ph.D., associate professor. His research interests include image and video processing.
LI Yi，born in 1998, M.S. candidate. His research interests include artificial intelligence and image fusion.
ZHAO Xinxin, born in 1996, M.S. candidate. Her research interests include machine learning and image retrieval.
Supported by:
Natural Science Foundation of Jiangsu Province(BK20200581);Postdoctoral Science Foundation of China(2020M683754)

摘要/Abstract

摘要：

视频多目标跟踪是计算机视觉领域的一个关键任务，在工业、商业及军事领域有着广泛的应用前景。目前，深度学习的快速发展为解决多目标跟踪问题提供了多种方案。然而，目标外观发生突变、目标区域被严重遮挡以及目标的消失和出现等挑战性的问题还未完全解决。重点关注基于深度学习的在线多目标跟踪算法，总结了该领域的最新进展，按照目标特征预测、表观特征提取和数据关联三个重要模块，依据基于检测跟踪（DBT）和联合检测跟踪（JDT）两个经典框架将深度在线多目标跟踪算法分为了六个小类，讨论不同类别算法的原理和优缺点。其中，DBT算法的多阶段设计结构清晰，容易优化，但多阶段的训练可能导致次优解；JDT算法融合检测和跟踪的子模块达到了更快的推理速度，但存在各模块协同训练的问题。目前，多目标跟踪开始关注目标的长期特征提取、遮挡目标处理、关联策略改进以及端到端框架的设计。最后，结合已有算法，总结了深度在线多目标跟踪亟待解决的问题并展望未来可能的研究方向。

关键词: 在线多目标跟踪, 深度学习, 特征提取, 数据关联

Abstract:

Video multi-object tracking is a key task in the field of computer vision and has a wide application prospect in industry, commerce and military fields. At present, the rapid development of deep learning provides many solutions to solve the problem of multi-object tracking. However, the challenging problems such as mutation of target appearance, serious occlusion of target area, disappearance and appearance of target have not been completely solved. This paper focuses on online multi-object tracking algorithm based on deep learning, and summarizes the latest progress in this field. According to the three important modules of feature prediction, apparent feature extraction and data association, as will as the two frameworks of detection-based-tracking (DBT) and joint-detection-tracking (JDT), this paper divides deep online multi-object tracking algorithms into six sub-classes, and discusses the principles, advantages and disadvantages of different types of algorithms. Among them, the multi-stage design of the DBT algorithm has a clear structure and is easy to optimize, but multi-stage training may lead to sub-optimal solutions; the sub-modules of the JDT algorithm that integrates detection and tracking achieve faster inference speed, but there is a problem of collaborative training of each module. Currently, multi-target tracking begins to focus on long-term feature extraction of targets, occlusion target processing, association strategy improvement, and end-to-end framework design. Finally, combined with the existing algorithms, this paper summarizes urgent problems to be solved in deep online multi-object tracking and looks forward to possible research directions in the future.

Key words: online multi-object tracking, deep learning, feature extraction, data association

中图分类号:

TP391

刘文强, 裘杭萍, 李航, 杨利, 李阳, 苗壮, 李一, 赵昕昕. 深度在线多目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(12): 2718-2733.

LIU Wenqiang, QIU Hangping, LI Hang, YANG Li, LI Yang, MIAO Zhuang, LI Yi, ZHAO Xinxin. Survey of Deep Online Multi-object Tracking Algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(12): 2718-2733.

图/表 14

参考文献 109

[1]	LUITEN J, OSEP A, DENDORFER P, et al. HOTA: a higher order metric for evaluating multi-object tracking[J]. arXiv:2009.07736, 2020.
[2]	LUO Y, YIN D, WANG A, et al. Pedestrian tracking in sur-veillance video based on modified CNN[J]. Multimedia Tools and Applications, 2018, 77: 24041-24058. DOI URL
[3]	HAO J X, ZHOU Y M, ZHANG G S, et al. A review of target tracking algorithm based on UAV[C]// Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems, Shenzhen, Oct 25-27, 2018. Piscataway:IEEE, 2018: 328-333.
[4]	刘彩虹, 张磊, 黄华. 交通路口监控视频跨视域多目标跟踪的可视化[J]. 计算机学报, 2018, 41(1): 221-235.
	LIU C H, ZHANG L, HUANG H. Visualization of cross-view multi-object tracking for surveillance videos in crossroad[J]. Chinese Journal of Computers, 2018, 41(1): 221-235.
[5]	金盛龙, 李宇, 黄海宁. 水下多目标方位的联合检测与跟踪[J]. 声学学报, 2019, 44(4): 503-512.
	JIN S L, LI Y, HUANG H N. A unified method for underwater multi-target bearing detection and tracking[J]. Acta Acustica, 2019, 44(4): 503-512.
[6]	ZHANG Y F, SUN P Z, JIANG Y J, et al. ByteTrack: multi-object tracking by associating every detection box[J]. arXiv:2110.06864, 2021.
[7]	REN S, HE K, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39: 1137-1149. DOI URL
[8]	FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D A, et al. Object detection with discriminatively trained part based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32: 1627-1645. DOI URL
[9]	CHOI W, SAVARESE S. Multiple target tracking in world coordinate with single, minimally calibrated camera[C]// LNCS 6314: Proceedings of the 11th European Conference on Computer Vision, Heraklion, Sep 5-11, 2010. Berlin, Heidelberg: Springer, 2010: 553-567.
[10]	ZADINIA H, SALEEMI I, LI W H, et al. (MP)2T: multiple people multiple parts tracker[C]// LNCS 7577: Proceedings of the 12th European Conference on Computer Vision, Florence, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 100-114.
[11]	SUN Z, CHEN J, CHAO L, et al. A survey of multiple pedestrian tracking based on tracking-by-detection framework[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31: 1819-1833. DOI URL
[12]	CIAPARRONE G, SÁNCHEZ F L, TABIK S, et al. Deep learning in video multi-object tracking: a survey[J]. Neuro-computing, 2020, 381: 61-88.
[13]	张瑶, 卢焕章, 张路平, 等. 基于深度学习的视觉多目标跟踪算法综述[J]. 计算机工程与应用, 2021, 57(13): 55-66. DOI
	ZHANG Y, LU H Z, ZHANG L P, et al. Overview of visual multi-object tracking algorithms with deep learning[J]. Computer Engineering and Applications, 2021, 57(13): 55-66. DOI
[14]	WANG G A, SONG M L, HWANG J N. Recent advances in embedding methods for multi-object tracking: a survey[J]. arXiv:2205.10766, 2022.
[15]	BEWLEY A, GE Z Y, OTT L, et al. Simple online and realtime tracking[C]// Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, Sep 25-28, 2016. Piscataway: IEEE, 2016: 3464-3468.
[16]	ZHANG L, GRAY H, YE X J, et al. Automatic individual pig detection and tracking in pig farms[J]. Sensors, 2019, 19(5): 1188. DOI URL
[17]	LU Y Y, LU C W, TANG C K. Online video object detection using association LSTM[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2363-2371.
[18]	GIRSHICK R B. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 1440-1448.
[19]	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[J]. arXiv:1605.06409, 2016.
[20]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[21]	REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 779-788.
[22]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[23]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[J]. arXiv:2004.10934, 2020.
[24]	GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[J]. arXiv:2107.08430, 2021.
[25]	CAO J, WENG X, KHIRODKAR R, et al. Observation-centric sort: rethinking sort for robust multi-object tracking[J]. arXiv:2203.14360, 2022.
[26]	WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]// Proceedings of the 2017 IEEE International Conference on Image Processing, Beijing, Sep 17-20, 2017. Piscataway: IEEE, 2017: 3645-3649.
[27]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[28]	ZAGORUYKO S, KOMODAKIS N. Wide residual networks[J]. arXiv:1605.07146, 2016.
[29]	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1-9.
[30]	KARTHIK S, PRABHU A, GANDHI V. Simple unsupervised multi-object tracking[J]. arXiv:2006.02609, 2020.
[31]	BAISA N L. Occlusion-robust online multi-object visual tracking using a GM-PHD filter with a CNN-based reiden-tification[J]. Journal of Visual Communication and Image Representation, 2021, 80: 103279. DOI URL
[32]	YANG F, CHANG X, SAKTI S, et al. ReMOT: a model-agnostic refinement for multiple object tracking[J]. Image and Vision Computing, 2021, 106: 104091. DOI URL
[33]	BAISA N L. Robust online multi-target visual tracking using a HISP filter with discriminative deep appearance learning[J]. Journal of Visual Communication and Image Representation, 2021, 77: 102952. DOI URL
[34]	ZHANG Y, SHENG H, WU Y, et al. Multiplex labeling graph for near-online tracking in crowded scenes[J]. IEEE Internet of Things Journal, 2020, 7(9): 7892-7902. DOI URL
[35]	FENG W T, HU Z H, WU W, et al. Multi-object tracking with multiple cues and switcher-aware classification[J]. arXiv: 1901.06129, 2019.
[36]	牛通, 卿粼波, 许盛宇, 等. 基于深度学习的分层关联多行人跟踪[J]. 计算机工程与应用, 2021, 57(8): 96-102. DOI
	NIU T, QING L B, XU S Y, et al. Multiple target tracking using hierarchical data association based on deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 96-102. DOI
[37]	CHEN L, AI H Z, ZHUANG Z J, et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification[C]// Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, San Diego, Jul 23-27, 2018. Washington: IEEE Computer Society, 2018: 1-6.
[38]	DU Y H, SONG Y, YANG B, et al. StrongSORT: make DeepSORT great again[J]. arXiv: 2202.13514, 2022.
[39]	LIU Q, LIU B, WU Y, et al. Real-time online multi-object tracking in compressed domain[J]. IEEE Access, 2019, 7: 76489-76499. DOI
[40]	LI W, XIONG Y J, YANG S, et al. Semi-TCL: semi-supervised track contrastive representation learning[J]. arXiv:2107.02396, 2021.
[41]	WANG Z D, ZHENG L, LIU Y X, et al. Towards real-time multi-object tracking[J]. arXiv:1909.12605, 2019.
[42]	BABAEE M, LI Z M, RIGOLL G. Occlusion handling in tracking multiple people using RNN[C]// Proceedings of the 2018 IEEE International Conference on Image Processing,Athens, Oct 7-10, 2018. Piscataway: IEEE, 2018: 2715-2719.
[43]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9: 1735-1780. PMID
[44]	GIRBAU A, GIRO-I-NIETO X, RIUS I, et al. Multiple object tracking with mixture density networks for trajectory estimation[J]. arXiv:2106.10950, 2021.
[45]	HAN S, HUANG P, WANG H, et al. MAT: motion-aware multi-object tracking[J]. arXiv:2009.04794, 2020.
[46]	EVANGELIDIS G D, PSARAKIS E Z. Parametric image alignment using enhanced correlation coefficient maximization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(10): 1858-1865. DOI PMID
[47]	SALEH F S, ALIAKBARIAN S, REZATOFIGHI H, et al. Probabilistic tracklet scoring and inpainting for multiple object tracking[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14329-14339.
[48]	BABAEE M, ATHAR A, RIGOLL G. Multiple people tracking using hierarchical deep tracklet re-identification[J]. arXiv:1811.04091, 2018.
[49]	FANG K, XIANG Y, LI X C, et al. Recurrent autoregressive networks for online multi-object tracking[C]// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, Mar 12-15, 2018. Washington: IEEE Computer Society, 2018: 466-475.
[50]	WANG S, SHENG H, ZHANG Y, et al. A general recurrent tracking framework without real data[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 13199-13208.
[51]	WANG G A, WANG Y Z, GU R S, et al. Split and connect: a universal tracklet booster for multi-object tracking[J]. arXiv:2105.02426, 2021.
[52]	DAI P, WENG R L, CHOI W, et al. Learning a proposal classifier for multiple object tracking[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2443-2452.
[53]	KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[54]	PAPAKIS I, SARKAR A, KARPATNE A. GCNNMatch: graph convolutional neural networks for multi-object tracking via sinkhorn normalization[J]. arXiv:2010.00067, 2020.
[55]	XU Y, BAN Y, ALAMEDA-PINEDA X, et al. DeepMOT: a differentiable framework for training multiple object trackers[J]. arXiv:1906.06618, 2019.
[56]	SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681. DOI URL
[57]	JIANG X, LI P Z, LI Y L, et al. Graph neural based end-to-end data association framework for online multiple-object tracking[J]. arXiv:1907.05315, 2019.
[58]	SHAN C B, WEI C B, DENG B, et al. Tracklets predicting based adaptive graph tracking[J]. arXiv:2010.09015, 2020.
[59]	WENG X S, WANG Y X, MAN Y Z, et al. GNN3DMOT: graph neural network for 3D multi-object tracking with multi-feature learning[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 6498-6507.
[60]	WENG X, YUAN Y, KITANI K. PTP: parallelized tracking and prediction with graph neural networks and diversity sampling[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4640-4647. DOI URL
[61]	LI J H, GAO X, JIANG T T. Graph networks for multiple object tracking[C]// Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, Mar 1-5, 2020. Piscataway: IEEE, 2020: 708-717.
[62]	CHU P, WANG J, YOU Q Z, et al. TransMOT: spatial-temporal graph transformer for multiple object tracking[J]. arXiv:2104.00194, 2021.
[63]	STADLER D, BEYERER J. Modelling ambiguous assignments for multi-person tracking in crowds[C]// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Waikoloa, Jan 4-8, 2022. Piscataway: IEEE, 2022: 133-142.
[64]	REDMON J, FARHADI A. YOLOv3: an incremental improve- ment[J]. arXiv:1804.02767, 2018.
[65]	ZHANG Y F, WANG C Y, WANG X G, et al. FairMOT: on the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision, 2021, 129(11): 3069-3087. DOI URL
[66]	LI J X, DING Y, WEI H L. SimpleTrack: rethinking and improving the JDE approach for multi-object tracking[J]. arXiv:2203.03985, 2022.
[67]	单兆晨, 黄丹丹, 耿振野, 等. 免锚检测的行人多目标跟踪算法[J]. 计算机工程与应用, 2022, 58(10): 145-152. DOI
	SHAN Z C, HUANG D D, GENG Z Y, et al. Pedestrian multi-object tracking algorithm of anchor-free detection[J]. Computer Engineering and Applications, 2022, 58(10): 145-152. DOI
[68]	ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Objects as points[J]. arXiv:1904.07850, 2019.
[69]	YU F, WANG D Q, SHELHAMER E, et al. Deep layer aggregation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 2403-2412.
[70]	YANG J M, GE H W, YANG J L, et al. Online multi-object tracking using multi-function integration and tracking simulation training[J]. Applied Intelligence, 2022, 52: 1268-1288. DOI URL
[71]	LU Z C, RATHOD V, VOTEL R, et al. RetinaTrack: online single stage joint detection and tracking[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway:IEEE, 2020: 14656-14666.
[72]	LIANG C, ZHANG Z, LU Y, et al. Rethinking the competition between detection and ReID in multi-object tracking[J]. arXiv:2010.12138, 2020.
[73]	LIANG C, ZHANG Z P, ZHOU X, et al. One more check: making “fake background” be tracked again[J]. arXiv:2104.09441, 2021.
[74]	YU E, LI Z L, HAN S D, et al. RelationTrack: relation-aware multiple object tracking with decoupled representation[J]. arXiv:2105.04322, 2021.
[75]	WANG Y X, WENG X S, KITANI K. Joint detection and multi-object tracking with graph neural networks[J]. arXiv:2006.13164, 2020.
[76]	WANG Q, ZHENG Y, PAN P, et al. Multiple object tracking with correlation learning[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 3876-3886.
[77]	LIU Q, CHEN D, CHU Q, et al. Online multi-object tracking with unsupervised re-identification learning and occlusion estimation[J]. Neurocomputing, 2022, 483: 333-347. DOI URL
[78]	PANG B, LI Y Z, ZHANG Y F, et al. TubeTK: adopting tubes to track multi-object in a one-step training model[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 6307-6317.
[79]	REZATOFIGHI S H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach,Jun 16-20, 2019. Piscataway: IEEE, 2019: 658-666.
[80]	LIN T Y, GOYAL P, GIRSHICK R B, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42: 318-327. DOI URL
[81]	SUN S J, AKHTAR N, SONG X Y, et al. Simultaneous detection and tracking with motion modelling for multiple object tracking[J]. arXiv:2008.08826, 2020.
[82]	ZHOU X, KOLTUN V, KRÄHENBÜHL P. Tracking objects as points[J]. arXiv:2004.01177, 2020.
[83]	TOKMAKOV P, LI J, BURGARD W, et al. Learning to track with object permanence[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 10840- 10849.
[84]	WAN X Y, ZHOU S P, WANG J J, et al. Multiple object tracking by trajectory map regression with temporal priors embedding[C]// Proceedings of the 2021 ACM Multimedia Conference. New York: ACM, 2021: 1377-1386.
[85]	WU J L, CAO J L, SONG L C, et al. Track to detect and segment: an online multi-object tracker[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12347-12356.
[86]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773.
[87]	PENG J L, WANG C, WAN F B, et al. Chained-Tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[J]. arXiv:2007.14557, 2020.
[88]	XU Y H, BAN Y T, DELORME G, et al. TransCenter: transformers with dense queries for multiple-object tracking[J]. arXiv:2103.15145, 2021.
[89]	BERGMANN P, MEINHARDT T, LEAL-TAIXÉ L. Tracking without bells and whistles[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 941-951.
[90]	鄂贵, 王永雄. 基于R-FCN框架的多候选关联在线多目标跟踪[J]. 光电工程, 2020, 47(1): 29-37.
	E GUI, WANG Y X. Multi-candidate association online multi- target tracking based on R-FCN framework[J]. Opto-Electronic Engineering, 2020, 47(1): 29-37.
[91]	ZHANG J Y, ZHOU S P, CHANG X, et al. Multiple object tracking by flowing and fusing[J]. arXiv:2001.11180, 2020.
[92]	ZHU J, YANG H, LIU N, et al. Online multi-object tracking with dual matching attention networks[C]// LNCS 11209: Proceedings of the 15th European Conference on Computer Vision, Sep 8-14, 2018. Cham: Springer, 2018: 379-396.
[93]	CHU Q, OUYANG W L, LIU B, et al. DASOT: a unified framework integrating data association and single object tracking for online multi-object tracking[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 10672-10679.
[94]	LIN T Y, DOLLÁR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[95]	ZHENG L Y, TANG M, CHEN Y Y, et al. Improving multiple object tracking with single object tracking[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 2453-2462.
[96]	VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need[J]. arXiv:1706.03762, 2017.
[97]	MEINHARDT T, KIRILLOV A, LEAL-TAIXE L, et al. TrackFormer: multi-object tracking with transformers[J]. arXiv:2101.02702, 2021.
[98]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[J]. arXiv:2005.12872, 2020.
[99]	ZENG F G, DONG B, WANG T C, et al. MOTR: end-to-end multiple-object tracking with transformer[J]. arXiv:2105.03247, 2021.
[100]	SUN P Z, JIANG Y, ZHANG R F, et al. TransTrack: multiple-object tracking with transformer[J]. arXiv:2012.15460, 2020.
[101]	SUN P Z, CAO J K, JIANG Y, et al. DanceTrack: multi-object tracking in uniform appearance and diverse motion[J]. arXiv:2111.14690, 2021.
[102]	SUNDARARAMAN R, BRAGA C, MARCHAND É, et al. Tracking pedestrian heads in dense crowd[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 3865-3875.
[103]	MILAN A, LEAL-TAIXÉ L, REID I D, et al. MOT16: a benchmark for multi-object tracking[J]. arXiv:1603.00831, 2016.
[104]	DENDORFER P, REZATOFIGHI H, MILAN A, et al. MOT20: a benchmark for multi object tracking in crowded scenes[J]. arXiv:2003.09003, 2020.
[105]	VOIGTLAENDER P, KRAUSE M, OSEP A, et al. MOTS: multi-object tracking and segmentation[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7942-7951.
[106]	CHEN G L, WANG W G, HE Z J, et al. VisDrone-MOT2021: the vision meets drone multiple object tracking challenge results[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 11-17, 2021. Piscataway: IEEE, 2021: 2839-2846.
[107]	WEN L Y, DU D W, CAI Z W, et al. UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking[J]. Computer Vision and Image Understanding, 2020, 193: 102907. DOI URL
[108]	GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237. DOI URL
[109]	SUN S, AKHTAR N, SONG H S, et al. Deep affinity network for multiple object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 104-119.

所属分类	算法特点	优势	不足
DBE	裁剪目标补丁进行分类学习	单独训练，可使用大量外部数据训练，可灵活地结合现成最佳的检测模型、REID模型和关联模型	多阶段训练，过程复杂，可能得到次优解，以单独的裁剪图像作为样本，无法学习轨迹的时空信息
DBP	提取单帧特征，然后聚合多帧外观和运动特征	能够联合学习目标的运动特征和表观特征以及它们的时空一致性	需要额外的时序网络学习多帧轨迹特征的交互，增加了训练和测试的计算成本
DBA	使用分配矩阵监督学习运动特征和表观特征	将分配矩阵作为训练目标，对齐了训练和推理过程，可以学习轨迹之间的交互，使运动特征和表观特征向着有利于跟踪指标的方向学习	进一步增加了复杂度，研究不够深入，还需要进一步探索
JDE	在检测算法中添加一个头部网络提取表观特征	只需一个共享的骨干网络，就能同时完成目标检测和表观特征的提取，并且可以扩展到提取多帧外观的聚合特征，简化了跟踪算法的框架	目前这一框架通常用一个多分类任务学习表观特征，这导致头部的分类器随着轨迹数量的改变而改变，并且作为多任务学习框架，任务之间的矛盾问题依然突出
JDP	在网络头部共同学习跨多帧的外观和运动特征，预测目标两帧间的位移	能够通过多帧特征聚合学习到轨迹特征，并利用视觉信息预测目标位移从而不依赖于先验的运动建模	由于计算成本，目前这种联合检测的跨帧特征建模通常只能使用相邻的几个帧，短轨迹学习到的运动特征不鲁棒，无法建模长期依赖信息
JDA	直接建模轨迹（查询）特征，该特征可以直接用来定位并识别目标，完成从检测到关联的端到端的跟踪	能够实现完全端到端的跟踪，轨迹特征可以学习到全局信息，包括时空一致性、轨迹与背景的相关性以及轨迹之间的交互	在对象被长时间遮挡时，轨迹嵌入容易发生偏移，导致跟踪器逐渐丢失目标

所属分类	算法特点	优势	不足
DBE	裁剪目标补丁进行分类学习	单独训练，可使用大量外部数据训练，可灵活地结合现成最佳的检测模型、REID模型和关联模型	多阶段训练，过程复杂，可能得到次优解，以单独的裁剪图像作为样本，无法学习轨迹的时空信息
DBP	提取单帧特征，然后聚合多帧外观和运动特征	能够联合学习目标的运动特征和表观特征以及它们的时空一致性	需要额外的时序网络学习多帧轨迹特征的交互，增加了训练和测试的计算成本
DBA	使用分配矩阵监督学习运动特征和表观特征	将分配矩阵作为训练目标，对齐了训练和推理过程，可以学习轨迹之间的交互，使运动特征和表观特征向着有利于跟踪指标的方向学习	进一步增加了复杂度，研究不够深入，还需要进一步探索
JDE	在检测算法中添加一个头部网络提取表观特征	只需一个共享的骨干网络，就能同时完成目标检测和表观特征的提取，并且可以扩展到提取多帧外观的聚合特征，简化了跟踪算法的框架	目前这一框架通常用一个多分类任务学习表观特征，这导致头部的分类器随着轨迹数量的改变而改变，并且作为多任务学习框架，任务之间的矛盾问题依然突出
JDP	在网络头部共同学习跨多帧的外观和运动特征，预测目标两帧间的位移	能够通过多帧特征聚合学习到轨迹特征，并利用视觉信息预测目标位移从而不依赖于先验的运动建模	由于计算成本，目前这种联合检测的跨帧特征建模通常只能使用相邻的几个帧，短轨迹学习到的运动特征不鲁棒，无法建模长期依赖信息
JDA	直接建模轨迹（查询）特征，该特征可以直接用来定位并识别目标，完成从检测到关联的端到端的跟踪	能够实现完全端到端的跟踪，轨迹特征可以学习到全局信息，包括时空一致性、轨迹与背景的相关性以及轨迹之间的交互	在对象被长时间遮挡时，轨迹嵌入容易发生偏移，导致跟踪器逐渐丢失目标

数据集名称	年份	特点	链接
DanceTrack^[101]	2021	跟踪舞台中的演员，目标运动模式复杂，动作幅度大，单个目标外观变化大，同时多个目标服饰相同外观相似	https://github.com/DanceTrack/DanceTrack
CroHD^[102]	2021	提供行人头部标注，用以缓解跟踪场景中的严重遮挡，数据集包括较高视角下室内外的拥挤人群	https://motchallenge.net/
MOT dataset^[103-104]	2015—2020	多目标跟踪的集中式基准，包含多个数据集	https://motchallenge.net/
MOTS^[105]	2019	多目标跟踪与分割数据集，在部分KITTI和MOT17数据上提供像素级的标注	https://motchallenge.net/
Vis Drone^[106]	2021	无人机视角下的多目标跟踪数据集	http://www.aiskyeye.com/views/index
UA-DETRAC^[107]	2020	多种场景下的多个类的车辆检测与跟踪标注	http://detrac-db.rit.albany.edu/
KITTI-Tracking^[108]	2013	稀疏场景下的行人与车辆跟踪数据集	http://www.cvlibs.net/datasets/kitti/eval_tracking.php/
KIT AIS	2012	航拍图像序列的车辆与行人跟踪数据集	KIT-IPF-Datensätze und Software
TownCentre	2009	街景行人跟踪数据集，场景简单，标注完整，画面清晰，数据量较少	https://exposing.ai/oxford_town_centre/

数据集名称	年份	特点	链接
DanceTrack^[101]	2021	跟踪舞台中的演员，目标运动模式复杂，动作幅度大，单个目标外观变化大，同时多个目标服饰相同外观相似	https://github.com/DanceTrack/DanceTrack
CroHD^[102]	2021	提供行人头部标注，用以缓解跟踪场景中的严重遮挡，数据集包括较高视角下室内外的拥挤人群	https://motchallenge.net/
MOT dataset^[103-104]	2015—2020	多目标跟踪的集中式基准，包含多个数据集	https://motchallenge.net/
MOTS^[105]	2019	多目标跟踪与分割数据集，在部分KITTI和MOT17数据上提供像素级的标注	https://motchallenge.net/
Vis Drone^[106]	2021	无人机视角下的多目标跟踪数据集	http://www.aiskyeye.com/views/index
UA-DETRAC^[107]	2020	多种场景下的多个类的车辆检测与跟踪标注	http://detrac-db.rit.albany.edu/
KITTI-Tracking^[108]	2013	稀疏场景下的行人与车辆跟踪数据集	http://www.cvlibs.net/datasets/kitti/eval_tracking.php/
KIT AIS	2012	航拍图像序列的车辆与行人跟踪数据集	KIT-IPF-Datensätze und Software
TownCentre	2009	街景行人跟踪数据集，场景简单，标注完整，画面清晰，数据量较少	https://exposing.ai/oxford_town_centre/

所属分类	Method	Detection	Data	MOTA/%↑	IDF1/%↑	HOTA/%↑	FP↓	FN↓	IDs↓	FPS↑
DBE	HISP^[33]	public	no	45.4	39.9	34.0	21 820	277 473	1 194	3.2
	GM-PHD^[31]	public	no	46.8	54.1	41.5	38 452	257 678	3 865	30.8
	OTCD^[39]	public	CP	48.6	47.9	38.4	18 499	268 204	3 502	15.5
	MOTDT^[37]	public	no	50.9	52.7	41.2	24 069	250 768	2 474	18.3
	UnsupTrack^[30]	public	no	61.7	58.1	46.9	16 872	197 632	1 864	2.0
	StrongSORT^[38]	private	CH	79.6	79.5	64.4	27 876	86 205	1 194	7.1
DBP	Sp_Con^[51]	public	no	61.5	63.3	50.5	14 056	200 655	2 478	7.7
	TrajE^[44]	public	no	67.4	61.2	49.7	18 652	161 347	4 019	1.4
	FUFET^[58]	private	5D1	76.2	68.0	57.9	32 796	98 475	3 237	6.8
DBA	DAN^[109]	private	no	52.4	49.5	39.3	25 423	234 592	8 431	<3.9
	DeepMOT^[55]	public	no	53.7	53.8	42.4	11 731	247 447	1 947	4.9
	GCNNMatch^[54]	public	no	57.3	56.3	45.4	14 100	225 042	1 911	1.3
JDE	OUTrack^[77]	public	CH	69.0	66.8	54.8	28 795	141 580	4 472	27.6
	GSDT^[75]	private	5D2	73.2	66.5	55.2	26 397	120 666	3 891	4.9
	FairMOT^[65]	private	5D1	73.7	72.3	59.3	27 507	117 477	3 303	25.9
	CSTrack^[72]	private	5D2	74.9	72.6	59.3	23 847	114 303	3 567	15.8
	Corrtracker^[76]	private	5D1	76.5	73.6	60.7	29 808	99 510	3 369	15.6
JDP	Tracktor++v2^[89]	public	no	56.3	55.1	44.8	8 866	235 449	1 987	1.5
	CenterTrack^[82]	private	CH	61.5	59.6	48.2	14 076	200 672	2 583	17.0
	Tube_TK^[78]	private	no	63.0	58.6	48.0	27 060	177 483	4 137	3.0
	Chained-Tracker^[87]	private	no	66.6	57.4	49.0	22 284	160 491	5 529	6.8
	TransCenter^[88]	private	no	70.0	62.1	52.1	28 119	136 722	4 647	1.0
JDA	DASOT^[93]	public	no	49.5	51.8	41.5	33 640	247 370	4 142	9.1
	MOTR^[99]	private	no	71.9	68.4	57.2	21 123	135 561	2 115	7.5
	TrackFormer^[97]	private	CH	74.1	68.0	57.3	34 602	108 777	2 829	5.7
	TransTrack^[100]	private	CH	75.2	63.5	54.1	50 157	86 442	3 603	10.0
	TransMOT^[62]	private	2D	76.7	75.1	61.7	36 231	93 150	2 346	9.6

深度在线多目标跟踪算法综述

Survey of Deep Online Multi-object Tracking Algorithms

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 109

相关文章 15

编辑推荐

Metrics

所属分类	Method	Detection	Data	MOTA/%↑	IDF1/%↑	HOTA/%↑	FP↓	FN↓	IDs↓	FPS↑
DBE	GM-PHD^[31]	public	no	44.7	43.5	35.6	42 778	236 116	7 492	25.2
	UnsupTrack^[30]	public	no	53.6	50.6	41.7	6 439	231 298	2 178	1.3
	StrongSORT^[38]	private	CH	73.8	77.0	62.6	16 632	117 920	770	1.4
DBP	Sp_Con^[51]	public	no	54.6	53.4	42.5	14 056	200 655	2 478	7.7
DBA	GCNNMatch^[54]	public	no	54.5	49.0	40.2	9 522	223 611	2 038	0.1
JDE	FairMOT^[65]	private	5D1	61.8	67.3	54.6	103 440	88 901	5 243	13.2
	OUTrack^[77]	public	CH	65.4	65.1	52.1	38 243	137 770	2 885	5.1
	CSTrack^[72]	private	5D2	66.6	68.6	54.0	25 404	144 358	3 196	4.5
	GSDT^[75]	private	5D2	67.1	67.5	53.6	31 507	135 395	3 230	1.5
	RelationTrack^[74]	private	5D1	67.2	70.5	56.5	61 134	104 597	4 243	4.3
JDP	Tracktor++v2^[89]	public	no	52.6	52.7	42.1	6 930	236 680	1 648	1.2
JDP	TransCenter^[88]	private	no	58.5	49.6	43.5	64 217	146 019	4 695	1.0
JDA	TransTrack^[100]	private	CH	65.0	59.4	48.9	27 191	150 197	3 608	14.9
	TrackFormer^[97]	private	CH	68.6	65.7	54.7	20 348	140 373	1 532	5.7
	TransMOT^[62]	private	CH	77.5	75.2	61.9	34 201	80 788	1 615	2.6

[1]	张璐, 芦天亮, 杜彦辉. 人脸视频深度伪造检测方法综述[J]. 计算机科学与探索, 2023, 17(1): 1-26.
[2]	王仕宸, 黄凯, 陈志刚, 张文东. 深度学习的三维人体姿态估计综述[J]. 计算机科学与探索, 2023, 17(1): 74-87.
[3]	梁佳利, 华保健, 吕雅帅, 苏振宇. 面向深度学习算子的循环不变式外提算法[J]. 计算机科学与探索, 2023, 17(1): 127-139.
[4]	王剑哲, 吴秦. 坐标注意力特征金字塔的显著性目标检测算法[J]. 计算机科学与探索, 2023, 17(1): 154-165.
[5]	张祥平, 刘建勋. 基于深度学习的代码表征及其应用综述[J]. 计算机科学与探索, 2022, 16(9): 2011-2029.
[6]	李冬梅, 罗斯斯, 张小平, 许福. 命名实体识别方法研究综述[J]. 计算机科学与探索, 2022, 16(9): 1954-1968.
[7]	任宁, 付岩, 吴艳霞, 梁鹏举, 韩希. 深度学习应用于目标检测中失衡问题研究综述[J]. 计算机科学与探索, 2022, 16(9): 1933-1953.
[8]	杨才东, 李承阳, 李忠博, 谢永强, 孙方伟, 齐锦. 深度学习的图像超分辨率重建技术综述[J]. 计算机科学与探索, 2022, 16(9): 1990-2010.
[9]	吕晓琦, 纪科, 陈贞翔, 孙润元, 马坤, 邬俊, 李浥东. 结合注意力与循环神经网络的专家推荐算法[J]. 计算机科学与探索, 2022, 16(9): 2068-2077.
[10]	安凤平, 李晓薇, 曹翔. 权重初始化-滑动窗口CNN的医学图像分类[J]. 计算机科学与探索, 2022, 16(8): 1885-1897.
[11]	曾凡智, 许露倩, 周燕, 周月霞, 廖俊玮. 面向智慧教育的知识追踪模型研究综述[J]. 计算机科学与探索, 2022, 16(8): 1742-1763.
[12]	刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515.
[13]	赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503.
[14]	夏鸿斌, 肖奕飞, 刘渊. 融合自注意力机制的长文本生成对抗网络模型[J]. 计算机科学与探索, 2022, 16(7): 1603-1610.
[15]	孙方伟, 李承阳, 谢永强, 李忠博, 杨才东, 齐锦. 深度学习应用于遮挡目标检测算法综述[J]. 计算机科学与探索, 2022, 16(6): 1243-1259.