Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (6): 1405-1416.DOI: 10.3778/j.issn.1673-9418.2012016
• Graphics and Image • Previous Articles Next Articles
LI Yunhuan, WEN Jiwei, PENG Li()
Received:
2020-12-03
Revised:
2021-01-29
Online:
2022-06-01
Published:
2021-02-04
About author:
LI Yunhuan, born in 1998, M.S. candidate. His research interests include deep learning, computer vision and target tracking.Supported by:
通讯作者:
+ E-mail: penglimail2002@163.com作者简介:
李运寰(1998—),男,江苏盐城人,硕士研究生,主要研究方向为深度学习、计算机视觉、目标跟踪。基金资助:
CLC Number:
LI Yunhuan, WEN Jiwei, PENG Li. High Frame Rate Light-Weight Siamese Network Target Tracking[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1405-1416.
李运寰, 闻继伟, 彭力. 高帧率的轻量级孪生网络目标跟踪[J]. 计算机科学与探索, 2022, 16(6): 1405-1416.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2012016
Models | Parameters/106 | FLOPs/106 |
---|---|---|
AlexNet | 62.37 | 2 211 |
VGG16 | 138.35 | 30 234 |
ResNet18 | 11.68 | 3 555 |
MobileNetV1 | 4.23 | 1 132 |
Table 1 Parameters comparison of various neural networks
Models | Parameters/106 | FLOPs/106 |
---|---|---|
AlexNet | 62.37 | 2 211 |
VGG16 | 138.35 | 30 234 |
ResNet18 | 11.68 | 3 555 |
MobileNetV1 | 4.23 | 1 132 |
Layer name | Kernel size | Stride | Padding | Operators | Activation size | |
---|---|---|---|---|---|---|
For exemplar | For search | |||||
Input | 127×127×3 | 255×255×3 | ||||
Layer1 | 3×3×32 | 2 | 1 | 标准卷积 | 64×64×32 | 128×128×32 |
Layer2 | 3×3×32 | 1 | 1 | Dw Conv | 64×64×32 | 128×128×32 |
1×1×32×64 | 1 | 0 | Pw Conv | 64×64×64 | 128×128×64 | |
Crop | 62×62×64 | 126×126×64 | ||||
Layer3 | 3×3×64 | 2 | 1 | Dw Conv | 31×31×64 | 63×63×64 |
1×1×64×128 | 1 | 0 | Pw Conv | 31×31×128 | 63×63×128 | |
Layer4 | 3×3×128 | 1 | 1 | Dw Conv | 31×31×128 | 63×63×128 |
1×1×128×128 | 1 | 0 | Pw Conv | 31×31×128 | 63×63×128 | |
Crop | 29×29×128 | 61×61×128 | ||||
Layer5 | 3×3×128 | 2 | 1 | Dw Conv | 15×15×128 | 31×31×128 |
1×1×128×256 | 1 | 0 | Pw Conv | 15×15×256 | 31×31×256 | |
Layer6 | 3×3×256 | 1 | 1 | Dw Conv | 15×15×256 | 31×31×256 |
1×1×256×256 | 1 | 0 | Pw Conv | 15×15×256 | 31×31×256 | |
Crop | 13×13×256 | 29×29×256 | ||||
Layer7 | 3×3×256 | 1 | 1 | Dw Conv | 13×13×256 | 29×29×256 |
1×1×256×512 | 1 | 0 | Pw Conv | 13×13×512 | 29×29×512 | |
Crop | 11×11×512 | 27×27×512 | ||||
Layer8 | 3×3×512 | 1 | 1 | Dw Conv | 11×11×512 | 27×27×512 |
1×1×512×512 | 1 | 0 | Pw Conv | 11×11×512 | 27×27×512 | |
Crop | 9×9×512 | 25×25×512 | ||||
Layer9 | 3×3×512 | 1 | 1 | Dw Conv | 9×9×512 | 25×25×512 |
1×1×512×512 | 1 | 0 | Pw Conv | 9×9×512 | 25×25×512 | |
Crop | 7×7×512 | 23×23×512 | ||||
Layer10 | 1×1×256 | 1 | 0 | 标准卷积 | 7×7×256 | 23×23×256 |
Table 2 Architecture of siamese network based on MobileNetV1
Layer name | Kernel size | Stride | Padding | Operators | Activation size | |
---|---|---|---|---|---|---|
For exemplar | For search | |||||
Input | 127×127×3 | 255×255×3 | ||||
Layer1 | 3×3×32 | 2 | 1 | 标准卷积 | 64×64×32 | 128×128×32 |
Layer2 | 3×3×32 | 1 | 1 | Dw Conv | 64×64×32 | 128×128×32 |
1×1×32×64 | 1 | 0 | Pw Conv | 64×64×64 | 128×128×64 | |
Crop | 62×62×64 | 126×126×64 | ||||
Layer3 | 3×3×64 | 2 | 1 | Dw Conv | 31×31×64 | 63×63×64 |
1×1×64×128 | 1 | 0 | Pw Conv | 31×31×128 | 63×63×128 | |
Layer4 | 3×3×128 | 1 | 1 | Dw Conv | 31×31×128 | 63×63×128 |
1×1×128×128 | 1 | 0 | Pw Conv | 31×31×128 | 63×63×128 | |
Crop | 29×29×128 | 61×61×128 | ||||
Layer5 | 3×3×128 | 2 | 1 | Dw Conv | 15×15×128 | 31×31×128 |
1×1×128×256 | 1 | 0 | Pw Conv | 15×15×256 | 31×31×256 | |
Layer6 | 3×3×256 | 1 | 1 | Dw Conv | 15×15×256 | 31×31×256 |
1×1×256×256 | 1 | 0 | Pw Conv | 15×15×256 | 31×31×256 | |
Crop | 13×13×256 | 29×29×256 | ||||
Layer7 | 3×3×256 | 1 | 1 | Dw Conv | 13×13×256 | 29×29×256 |
1×1×256×512 | 1 | 0 | Pw Conv | 13×13×512 | 29×29×512 | |
Crop | 11×11×512 | 27×27×512 | ||||
Layer8 | 3×3×512 | 1 | 1 | Dw Conv | 11×11×512 | 27×27×512 |
1×1×512×512 | 1 | 0 | Pw Conv | 11×11×512 | 27×27×512 | |
Crop | 9×9×512 | 25×25×512 | ||||
Layer9 | 3×3×512 | 1 | 1 | Dw Conv | 9×9×512 | 25×25×512 |
1×1×512×512 | 1 | 0 | Pw Conv | 9×9×512 | 25×25×512 | |
Crop | 7×7×512 | 23×23×512 | ||||
Layer10 | 1×1×256 | 1 | 0 | 标准卷积 | 7×7×256 | 23×23×256 |
Tracker | Prec | AUC | Speed/(frame/s) |
---|---|---|---|
Ours | 0.813 | 0.610 | 120 |
SRDCF | 0.789 | 0.598 | 4 |
SiamTri | 0.784 | 0.590 | 82 |
CFNet | 0.781 | 0.587 | 75 |
SiamFC | 0.771 | 0.582 | 86 |
Staple | 0.771 | 0.578 | 56 |
SiamSqueeze | 0.754 | 0.564 | 110 |
fDSST | 0.687 | 0.517 | 54 |
Table 3 Performance comparison of each tracker on OTB2015
Tracker | Prec | AUC | Speed/(frame/s) |
---|---|---|---|
Ours | 0.813 | 0.610 | 120 |
SRDCF | 0.789 | 0.598 | 4 |
SiamTri | 0.784 | 0.590 | 82 |
CFNet | 0.781 | 0.587 | 75 |
SiamFC | 0.771 | 0.582 | 86 |
Staple | 0.771 | 0.578 | 56 |
SiamSqueeze | 0.754 | 0.564 | 110 |
fDSST | 0.687 | 0.517 | 54 |
Trackers | A | R | EAO |
---|---|---|---|
Ours | 0.521 | 0.520 | 0.238 |
UNet-SiamFC | 0.490 | 0.580 | 0.214 |
DSiam | 0.512 | 0.646 | 0.196 |
SiamFC | 0.503 | 0.585 | 0.188 |
DCFNet | 0.470 | 0.543 | 0.182 |
DensSiam | 0.462 | 0.688 | 0.174 |
Staple | 0.530 | 0.688 | 0.169 |
Table 4 Performance comparison of each tracker on VOT2018
Trackers | A | R | EAO |
---|---|---|---|
Ours | 0.521 | 0.520 | 0.238 |
UNet-SiamFC | 0.490 | 0.580 | 0.214 |
DSiam | 0.512 | 0.646 | 0.196 |
SiamFC | 0.503 | 0.585 | 0.188 |
DCFNet | 0.470 | 0.543 | 0.182 |
DensSiam | 0.462 | 0.688 | 0.174 |
Staple | 0.530 | 0.688 | 0.169 |
算法名称 | Prec | AUC | Speed/(frame/s) | 参数量 |
---|---|---|---|---|
SiamFC | 0.771 | 0.582 | 86 | 2 334 080 |
实验1 | 0.790 | 0.592 | ||
实验2 | 0.463 | 0.354 | ||
实验3 | 0.791 | 0.594 | ||
实验4 | 0.813 | 0.610 | 120 | 938 048 |
Table 5 Ablation experiment of proposed algorithm and benchmark algorithms on OTB2015
算法名称 | Prec | AUC | Speed/(frame/s) | 参数量 |
---|---|---|---|---|
SiamFC | 0.771 | 0.582 | 86 | 2 334 080 |
实验1 | 0.790 | 0.592 | ||
实验2 | 0.463 | 0.354 | ||
实验3 | 0.791 | 0.594 | ||
实验4 | 0.813 | 0.610 | 120 | 938 048 |
[1] | 卢湖川, 李佩霞, 王栋. 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31(1): 61-67. |
LU H C, LI P X, WANG D. Visual object tracking: a survey[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(1): 61-67. | |
[2] | HENRIQUES J F, CASEIRO R, MARTINS P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]// LNCS 7575: Proceedings of the 12th European Conference on Computer Vision, Florence, Oct 7-13, 2012. Berlin, Heidelberg: Springer, 2012: 702-715. |
[3] |
HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596.
DOI URL |
[4] | 方路平, 何杭江, 周国民. 目标检测算法研究综述[J]. 计算机工程与应用, 2018, 54(13): 11-18. |
FANG L P, HE H J, ZHOU G M. Research overview of object detection methods[J]. Computer Engineering and Applications, 2018, 54(13): 11-18. | |
[5] | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 4293-4302. |
[6] | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional siamese networks for object tracking[C]// LNCS 9914: Proceedings of the 14th European Conference on Computer Vision—ECCV Workshops 2016, Amsterdam, Oct 8-10, 15-16, 2016. Cham: Springer, 2016: 850-865. |
[7] | HE A, LUO C, TIAN X, et al. A twofold siamese network for real-time object tracking[C]// Proceedings of the 2018 IEEE International Conference on Computer Vision, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 4834-4843. |
[8] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
DOI URL |
[9] | HOWARD A G, ZHU M L, CHEN B, et al. MobileNets efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2020-05-13]. https://arxiv.org/abs/1704.04861 . |
[10] | ZHANG Z, PENG H. Deeper and wider siamese networks for real-time visual tracking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 4591-4600. |
[11] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141. |
[12] | WOO S, PARK J, LEE J, et al. CBAM: convolutional block attention module[C]// LNCS 11211: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. |
[13] | DONG X P, SHEN J B. Triplet loss in siamese network for object tracking[C]// LNCS 11217: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 472-488. |
[14] | VALMADRE J, BERTINETTO L, HENRIQUES J F, et al. End-to-end representation learning for correlation filter based tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 22-25, 2017. Washington: IEEE Computer Society, 2017: 5000-5008. |
[15] | DANELLJAN M, HAGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 4310-4318. |
[16] | BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: complementary learners for real-time tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 1401-1409. |
[17] |
DANELLJAN M, HAGER G, KHAN F S, et al. Discriminative scale space tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1561-1575.
DOI URL |
[18] | ZHANG L C, GONZALEZ-GARCIA A, WEIJER J, et al. Learning the model update for siamese trackers[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 4009-4018. |
[19] | GUO Q, FENG W, ZHOU C, et al. Learning dynamic siamese network for visual object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1781-1789. |
[20] | WANG Q, GAO J, XING J, et al. DCFNet: discriminant correlation filters network for visual tracking[J]. arXiv:1704.04057, 2017. |
[21] | ABDELPAKEY M, SHEHATA M, MOHAMED M. DensSiam: end-to-end densely-siamese network with self-attention model for object tracking[C]// LNCS 11241: Proceedings of the 13th International Symposium on Advances in Visual Computing, Las Vegas, Nov 19-21, 2018. Cham: Springer, 2018: 463-473. |
[1] | REN Yujie, YANG Jian, LIU Fangtao, ZHANG Qiyao. Research on Target Detection Method Based on SSD and MobileNet Network [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(11): 1881-1893. |
[2] | LIU Fang, HUANG Guangwei, LU Lixia, WANG Hongjuan, WANG Xin. Robust Target Tracking Algorithm for Adaptive Template Updating [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(1): 83-96. |
[3] | ZHANG Jing, WANG Xu, FAN Hongbo. TLD Object Tracking Algorithm Based on Spatio-Temporal Context Similarity [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1169-1181. |
[4] | MENG Fankun, JU Yongfeng,WEN Changbao. Stochastic Mesh Regression Monte Carlo Based UAVs Optimal Target Tracking [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(3): 450-458. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/