High Frame Rate Light-Weight Siamese Network Target Tracking

doi:10.3778/j.issn.1673-9418.2012016

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (6): 1405-1416.DOI: 10.3778/j.issn.1673-9418.2012016

• Graphics and Image • Previous Articles Next Articles

High Frame Rate Light-Weight Siamese Network Target Tracking

LI Yunhuan, WEN Jiwei, PENG Li()

Engineering Research Center of Internet of Things Technology Applications (School of Internet of Things Engineering, Jiangnan University), Ministry of Education, Wuxi, Jiangsu 214122, China

Received:2020-12-03 Revised:2021-01-29 Online:2022-06-01 Published:2021-02-04
About author:LI Yunhuan, born in 1998, M.S. candidate. His research interests include deep learning, computer vision and target tracking.
WEN Jiwei, born in 1981, Ph.D., associate professor, M.S. supervisor. His research interests include stochastic switched systems, model predictive control, T-S fuzzy modeling and control.
PENG Li, born in 1967, Ph.D., professor, Ph.D. supervisor, member of CAAI and CCF. His research interests include visual Internet of things, action recognition and deep learning.
Supported by:
National Key Research and Development Program of China(2018YFD0400902);National Natural Science Foundation of China(61873112)

高帧率的轻量级孪生网络目标跟踪

李运寰, 闻继伟, 彭力()

物联网技术应用教育部工程研究中心（江南大学物联网工程学院）,江苏无锡 214122

通讯作者: + E-mail: penglimail2002@163.com
作者简介:李运寰（1998—）,男,江苏盐城人,硕士研究生,主要研究方向为深度学习、计算机视觉、目标跟踪。
闻继伟（1981—）,男,江苏无锡人,博士,副教授,硕士生导师,主要研究方向为随机切换系统、模型预测控制、T-S模糊建模与控制。
彭力（1967—）,男,河北唐山人,博士,教授,博士生导师,CAAI会员,CCF会员, 主要研究方向为视觉物联网、行为识别、深度学习。
基金资助:
国家重点研发计划(2018YFD0400902);国家自然科学基金(61873112)

Abstract

Abstract:

With the widespread use of target tracking in many life scenarios, the demand for high-precision and high-speed tracking algorithms is also increasing. For some specific scenarios such as mobile terminals, embedded devices, etc., under the premise of relatively insufficient computing power of the device, it is still necessary to ensure that the tracker achieves good tracking accuracy and high-speed real-time tracking. A high frame rate tracking algorithm based on light-weight siamese network is proposed to solve this problem. Firstly, the light-weight convolutional neural network MobileNetV1 is selected, which is easy to be deployed in embedded devices, as the feature extraction backbone network, and deep network is more capable of extracting target features. Then, two optimization strategies are proposed to address the shortcomings of the backbone network, feature map is cropped and the total network step length is adjusted to make the backbone network suitable for tracking tasks. Finally, after the template branch of the siamese network, an ultra-lightweight channel attention module is added to weight important information that highlights the target characteristics. The proposed algorithm parameters are reduced by 59.8% in comparison with current mainstream algorithm SiamFC. Simulation and experimental results on the OTB2015 dataset show that the tracking accuracy is increased by 5.4%, and the algorithm can better cope with complex and changeable challenges in tracking tasks. Simulation and experimental results on the VOT2018 dataset show that the comprehensive index expected average overlap (EAO) is increased by 26.6%, and the average speed of the algorithm under NVIDIA GTX1080Ti is 120 frame/s, which achieves high frame rate real-time tracking.

Key words: target tracking, MobileNet, siamese networks, channel attention mechanism

摘要：

随着目标跟踪在众多生活场景的广泛运用,高精度且高速的跟踪算法需求也日益增多。针对某些特定场景如移动端、嵌入式等设备,在设备算力相对不足的前提下,仍要保证跟踪器达到良好的跟踪精度和高速实时跟踪问题,提出一种高帧率的轻量级孪生网络目标跟踪算法。首先,选取易于部署在嵌入式设备中的轻量级卷积神经网络MobileNetV1作为特征提取网络,深层网络具有对目标特征强大的提取能力;接着,针对主干网络的不足提出两点优化策略,特征图裁剪和网络总步长调整,使得主干网络适用于跟踪任务;最后,在孪生网络的模板分支后添加超轻量级通道注意力模块,加权突出目标特征的重要信息。对比当前主流算法SiamFC,该算法参数量减少59.8%;在OTB2015数据集上仿真实验表明,跟踪精度提升了5.4%,算法能更好地应对跟踪任务中复杂多变的挑战;在VOT2018数据集上的仿真实验表明,综合指标平均重叠期望（EAO）提升了26.6%,同时算法在NVIDIA GTX1080Ti下的平均速度为120 frame/s,达到高帧率实时跟踪。

关键词: 目标跟踪, MobileNet, 孪生网络, 通道注意力机制

CLC Number:

TP391

LI Yunhuan, WEN Jiwei, PENG Li. High Frame Rate Light-Weight Siamese Network Target Tracking[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1405-1416.

李运寰, 闻继伟, 彭力. 高帧率的轻量级孪生网络目标跟踪[J]. 计算机科学与探索, 2022, 16(6): 1405-1416.

Figures/Tables 18

References 21

[1]	卢湖川, 李佩霞, 王栋. 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31(1): 61-67.
	LU H C, LI P X, WANG D. Visual object tracking: a survey[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(1): 61-67.
[2]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]// LNCS 7575: Proceedings of the 12th European Conference on Computer Vision, Florence, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 702-715.
[3]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. DOI URL
[4]	方路平, 何杭江, 周国民. 目标检测算法研究综述[J]. 计算机工程与应用, 2018, 54(13): 11-18.
	FANG L P, HE H J, ZHOU G M. Research overview of object detection methods[J]. Computer Engineering and Applications, 2018, 54(13): 11-18.
[5]	NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 4293-4302.
[6]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]// LNCS 9914: Proceedings of the 14th European Conference on Computer Vision—ECCV Workshops 2016, Amsterdam, Oct 8-10, 15-16, 2016. Cham: Springer, 2016: 850-865.
[7]	HE A, LUO C, TIAN X, et al. A twofold siamese network for real-time object tracking[C]// Proceedings of the 2018 IEEE International Conference on Computer Vision, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 4834-4843.
[8]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. DOI URL
[9]	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2020-05-13]. https://arxiv.org/abs/1704.04861 .
[10]	ZHANG Z, PENG H. Deeper and wider siamese networks for real-time visual tracking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 4591-4600.
[11]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141.
[12]	WOO S, PARK J, LEE J, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[13]	DONG X P, SHEN J B. Triplet loss in siamese network for object tracking[C]// LNCS 11217: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 472-488.
[14]	VALMADRE J, BERTINETTO L, HENRIQUES J F, et al. End-to-end representation learning for correlation filter based tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 22-25, 2017. Washington: IEEE Computer Society, 2017: 5000-5008.
[15]	DANELLJAN M, HAGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 4310-4318.
[16]	BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: complementary learners for real-time tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 1401-1409.
[17]	DANELLJAN M, HAGER G, KHAN F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561-1575. DOI URL
[18]	ZHANG L C, GONZALEZ-GARCIA A, WEIJER J, et al. Learning the model update for siamese trackers[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 4009-4018.
[19]	GUO Q, FENG W, ZHOU C, et al. Learning dynamic siamese network for visual object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1781-1789.
[20]	WANG Q, GAO J, XING J, et al. DCFNet: discriminant correlation filters network for visual tracking[J]. arXiv:1704.04057, 2017.
[21]	ABDELPAKEY M, SHEHATA M, MOHAMED M. DensSiam: end-to-end densely-siamese network with self-attention model for object tracking[C]// LNCS 11241: Proceedings of the 13th International Symposium on Advances in Visual Computing, Las Vegas, Nov 19-21, 2018. Cham: Springer, 2018: 463-473.

Models	Parameters/10⁶	FLOPs/10⁶
AlexNet	62.37	2 211
VGG16	138.35	30 234
ResNet18	11.68	3 555
MobileNetV1	4.23	1 132

Models	Parameters/10⁶	FLOPs/10⁶
AlexNet	62.37	2 211
VGG16	138.35	30 234
ResNet18	11.68	3 555
MobileNetV1	4.23	1 132

Layer name	Kernel size	Stride	Padding	Operators	Activation size
Layer name	Kernel size	Stride	Padding	Operators	For exemplar	For search
Input					127×127×3	255×255×3
Layer1	3×3×32	2	1	标准卷积	64×64×32	128×128×32
Layer2	3×3×32	1	1	Dw Conv	64×64×32	128×128×32
	1×1×32×64	1	0	Pw Conv	64×64×64	128×128×64
				Crop	62×62×64	126×126×64
Layer3	3×3×64	2	1	Dw Conv	31×31×64	63×63×64
Layer3	1×1×64×128	1	0	Pw Conv	31×31×128	63×63×128
Layer4	3×3×128	1	1	Dw Conv	31×31×128	63×63×128
	1×1×128×128	1	0	Pw Conv	31×31×128	63×63×128
				Crop	29×29×128	61×61×128
Layer5	3×3×128	2	1	Dw Conv	15×15×128	31×31×128
Layer5	1×1×128×256	1	0	Pw Conv	15×15×256	31×31×256
Layer6	3×3×256	1	1	Dw Conv	15×15×256	31×31×256
	1×1×256×256	1	0	Pw Conv	15×15×256	31×31×256
				Crop	13×13×256	29×29×256
Layer7	3×3×256	1	1	Dw Conv	13×13×256	29×29×256
	1×1×256×512	1	0	Pw Conv	13×13×512	29×29×512
				Crop	11×11×512	27×27×512
Layer8	3×3×512	1	1	Dw Conv	11×11×512	27×27×512
	1×1×512×512	1	0	Pw Conv	11×11×512	27×27×512
				Crop	9×9×512	25×25×512
Layer9	3×3×512	1	1	Dw Conv	9×9×512	25×25×512
	1×1×512×512	1	0	Pw Conv	9×9×512	25×25×512
				Crop	7×7×512	23×23×512
Layer10	1×1×256	1	0	标准卷积	7×7×256	23×23×256

Layer name	Kernel size	Stride	Padding	Operators	Activation size
Layer name	Kernel size	Stride	Padding	Operators	For exemplar	For search
Input					127×127×3	255×255×3
Layer1	3×3×32	2	1	标准卷积	64×64×32	128×128×32
Layer2	3×3×32	1	1	Dw Conv	64×64×32	128×128×32
	1×1×32×64	1	0	Pw Conv	64×64×64	128×128×64
				Crop	62×62×64	126×126×64
Layer3	3×3×64	2	1	Dw Conv	31×31×64	63×63×64
Layer3	1×1×64×128	1	0	Pw Conv	31×31×128	63×63×128
Layer4	3×3×128	1	1	Dw Conv	31×31×128	63×63×128
	1×1×128×128	1	0	Pw Conv	31×31×128	63×63×128
				Crop	29×29×128	61×61×128
Layer5	3×3×128	2	1	Dw Conv	15×15×128	31×31×128
Layer5	1×1×128×256	1	0	Pw Conv	15×15×256	31×31×256
Layer6	3×3×256	1	1	Dw Conv	15×15×256	31×31×256
	1×1×256×256	1	0	Pw Conv	15×15×256	31×31×256
				Crop	13×13×256	29×29×256
Layer7	3×3×256	1	1	Dw Conv	13×13×256	29×29×256
	1×1×256×512	1	0	Pw Conv	13×13×512	29×29×512
				Crop	11×11×512	27×27×512
Layer8	3×3×512	1	1	Dw Conv	11×11×512	27×27×512
	1×1×512×512	1	0	Pw Conv	11×11×512	27×27×512
				Crop	9×9×512	25×25×512
Layer9	3×3×512	1	1	Dw Conv	9×9×512	25×25×512
	1×1×512×512	1	0	Pw Conv	9×9×512	25×25×512
				Crop	7×7×512	23×23×512
Layer10	1×1×256	1	0	标准卷积	7×7×256	23×23×256

Tracker	Prec	AUC	Speed/(frame/s)
Ours	0.813	0.610	120
SRDCF	0.789	0.598	4
SiamTri	0.784	0.590	82
CFNet	0.781	0.587	75
SiamFC	0.771	0.582	86
Staple	0.771	0.578	56
SiamSqueeze	0.754	0.564	110
fDSST	0.687	0.517	54

High Frame Rate Light-Weight Siamese Network Target Tracking

高帧率的轻量级孪生网络目标跟踪

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 18

References 21

Related Articles 4

Recommended Articles 0

Metrics

Trackers	A	R	EAO
Ours	0.521	0.520	0.238
UNet-SiamFC	0.490	0.580	0.214
DSiam	0.512	0.646	0.196
SiamFC	0.503	0.585	0.188
DCFNet	0.470	0.543	0.182
DensSiam	0.462	0.688	0.174
Staple	0.530	0.688	0.169

[1]	REN Yujie, YANG Jian, LIU Fangtao, ZHANG Qiyao. Research on Target Detection Method Based on SSD and MobileNet Network [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(11): 1881-1893.
[2]	LIU Fang, HUANG Guangwei, LU Lixia, WANG Hongjuan, WANG Xin. Robust Target Tracking Algorithm for Adaptive Template Updating [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(1): 83-96.
[3]	ZHANG Jing, WANG Xu, FAN Hongbo. TLD Object Tracking Algorithm Based on Spatio-Temporal Context Similarity [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1169-1181.
[4]	MENG Fankun, JU Yongfeng，WEN Changbao. Stochastic Mesh Regression Monte Carlo Based UAVs Optimal Target Tracking [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(3): 450-458.

算法名称	Prec	AUC	Speed/(frame/s)	参数量
SiamFC	0.771	0.582	86	2 334 080
实验1	0.790	0.592
实验2	0.463	0.354
实验3	0.791	0.594
实验4	0.813	0.610	120	938 048

算法名称	Prec	AUC	Speed/(frame/s)	参数量
SiamFC	0.771	0.582	86	2 334 080
实验1	0.790	0.592
实验2	0.463	0.354
实验3	0.791	0.594
实验4	0.813	0.610	120	938 048