Survey on 3D Human Pose Estimation of Deep Learning

doi:10.3778/j.issn.1673-9418.2205070

Abstract

Abstract: The purpose of 3D human pose estimation is to predict information such as the 3D coordinate position and angle of human joint points, and construct human representations (such as human bones) for further analysis of human posture. With the continuous advancement of deep learning methods, more and more high-performance 3D human pose estimation methods based on deep learning have been proposed. However, due to the human occlusion of the picture and the large demand for training scale, there are still challenges in 3D human pose estimation. The research purpose of this paper is to review a number of research papers in recent years, analyze and compare the reasoning process and core elements of these methods, and comprehensively elaborate the 3D human pose estimation methods based on deep learning in recent years. In addition, this paper also introduces the relevant data- sets and evaluation indicators, compares the experimental data of some models on the Human3.6M dataset, Campus dataset and Shelf dataset, and analyzes and compares the experimental results. Finally, according to the results of this survey, the difficulties and challenges faced by the current 3D human pose estimation are discussed, and the future development of 3D human pose estimation is discussed.

Key words: 3D human pose estimation, deep learning, neural networks, joints detection

摘要： 三维人体姿态估计的目的是预测出人体关节点的三维坐标位置和角度等信息，构建人体表示（如人体骨骼），以便进一步分析人体姿态。随着深度学习方法的不断推进，越来越多的基于深度学习的高性能三维人体姿态估计方法被提出。然而由于图片的人体遮挡、训练规模需求较大等原因，三维人体姿态估计仍然存在挑战。该研究目的是通过对近年来的多篇研究论文进行回顾，分析和比较这些方法的推理过程和核心要素，从不同输入的角度入手，全面阐述近年来基于深度学习的三维人体姿态估计方法。此外，还介绍了相关数据集和评价指标，在Human3.6M、Campus和Shelf数据集上对部分模型进行实验数据比对，分析对比实验结果。最后，根据本次调查的结果，讨论目前三维人体姿态估计所面临的困难和挑战，对三维人体姿态估计的未来发展进行了探讨。

关键词: 三维人体姿态估计, 深度学习, 神经网络, 关键点检测

WANG Shichen, HUANG Kai, CHEN Zhigang, ZHANG Wendong. Survey on 3D Human Pose Estimation of Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(1): 74-87.

王仕宸, 黄凯, 陈志刚, 张文东. 深度学习的三维人体姿态估计综述[J]. 计算机科学与探索, 2023, 17(1): 74-87.

References

[1] MULTI-PERSON BRIDGEMAN L, VOLINO M, GUILL-EMAUT J Y, et al. Multi-person 3D pose estimation and tracking in sports[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-21, 2019. Piscataway: IEEE， 2019: 2487-2496.
[2] ZHANG H, SCIUTTO C, AGRAWALA M, et al. Vid2player: controllable video sprites that behave and appear like profe-ssional tennis players[J]. ACM Transactions on Graphics, 2021, 40(3): 1-16.
[3] CHEN W, JIANG Z, GUO H, et al. Fall detection based on key points of human-skeleton using openpose[J]. Symmetry, 2020, 12(5): 744.
[4] WILLETT N S, SHIN H V, JIN Z, et al. Pose2Pose: pose selection and transfer for 2D character animation[C]//Proceedings of the 25th International Conference on Intelligent User Interfaces, Cagliari, Mar 17-20, 2020. New York: ACM, 2020: 88-99.
[5] IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36(7): 1325-1339.
[6] LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 1-16.
[7] SUN X, SHANG J, LIANG S, et al. Compositional human pose regression[C]//Proceedings of the 2017 IEEE Intern-ational Conference on Computer Vision, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2602-2611.
[8] PAVLAKOS G, ZHOU X, DANIILIDIS K. Ordinal depth supervision for 3D human pose estimation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7307-7316.
[9] PAVLAKOS G, ZHOU X, DERPANIS K G, et al. Coarse-to-fine volumetric prediction for singleimage 3D human pose[C]//Proceedings of the 2017 IEEE Conference on Com- puter Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 7025-7034.
[10] WANG Z, NIE X, QU X, et al. Distribution-aware single-stage models for multi-person 3D pose estimation[C]//Proceed-ings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 21-24, 2022. Piscataway: IEEE, 2022: 13096-13105.
[11] ZHAN Y, LI F, WENG R, et al. Ray3D: ray-based 3D human pose estimation for monocular absolute 3D local-ization[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 21-24, 2022. Piscataway: IEEE, 2022: 13116-13125.
[12] SUN X, XIAO B, WEI F, et al. Integral human pose regression[C]//LNCS 11210: Proceedings of the 15th Euro-pean Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 536-553.
[13] LI J, BIAN S, ZENG A, et al. Human pose regression with residual log-likelihood estimation[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 11-17, 2021. Piscataway: IEEE, 2021: 11025-11034.
[14] CHEN C H, RAMANAN D. 3D human pose estimation= 2D pose estimation+matching[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recog-nition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 7035-7043.
[15] MORENO-NOGUER F. 3D human pose estimation from a single image via distance matrix regression[C]//Proceed-ings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 2823-2832.
[16] MARTINEZ J, HOSSAIN R, ROMERO J, et al. A simple yet effective baseline for 3D human pose estimation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2640-2649.
[17] TEKIN B, MáRQUEZ-NEILA P, SALZMANN M, et al. Learning to fuse 2D and 3D image cues for monocular body pose estimation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 3941-3950.
[18] ZHOU K, HAN X, JIANG N, et al. Hemlets pose: learning part-centric heatmap triplets for accurate 3D human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Intern-ational Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 2344-2353.
[19] NIE B X, WEI P, ZHU S C. Monocular 3D human pose estimation by predicting depth on joints[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Soc-iety, 2017: 3447-3455.
[20] WANG J, HUANG S, WANG X, et al. Not all parts are created equal: 3D pose estimation by modeling bi-directional dependencies of body parts[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 7771-7780.
[21] NIE Q, LIU Z, LIU Y. Unsupervised 3D human pose representation with viewpoint and pose disentanglement[C]//LNCS 12364: Proceedings of the 16th European Confe-rence on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 102-118.
[22] MA X, SU J, WANG C, et al. Context modeling in 3D human pose estimation: a unified perspective[C]//Procee-dings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 6238-6247.
[23] YU T, ZHENG Z, ZHONG Y, et al. Simulcap: single-view human performance capture with cloth simulation[C]//Proceedings of the 2019 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Long Beach, Jun 16-21, 2019. Piscataway: IEEE, 2019: 5504-5514.
[24] HOSSAIN M R I, LITTLE J J. Exploiting temporal infor-mation for 3D human pose estimation[C]//LNCS 11214: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 69-86.
[25] DABRAL R, MUNDHADA A, KUSUPATI U, et al. Learning 3D human pose from structure and motion[C]//LNCS 11213: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Sprin-ger, 2018: 679-696.
[26] CAI Y, GE L, LIU J, et al. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolu-tional networks[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 2272-2281.
[27] CHEN T, FANG C, SHEN X, et al. Anatomy-aware 3D human pose estimation with bone-based pose decomposi-tion[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(1): 198-209.
[28] WEI W L, LIN J C, LIU T L, et al. Capturing humans in motion: temporal-attentive 3D human pose and shape estim-ation from monocular video[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 21-24, 2022. Piscataway: IEEE, 2022: 13211-13220.
[29] ZHANG J, TU Z, YANG J, et al. MixSTE: Seq2seq mixed spatio-temporal encoder for 3D human pose estimation in video[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 21-24, 2022. Piscataway: IEEE, 2022: 13232-13242.
[30] PAVLAKOS G, ZHU L, ZHOU X, et al. Learning to estimate 3D human pose and shape from a single color image[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 459-468.
[31] JIANG W, KOLOTOUROS N, PAVLAKOS G, et al. Cohe-rent reconstruction of multiple humans from a single image[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 5579-5588.
[32] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition , Columbus, Jun 24-27, 2014. Washington: IEEE Computer Society, 2014: 580-587.
[33] KUNDU J N, RAKESH M, JAMPANI V, et al. Appearance consensus driven self-supervised human mesh recovery[C]//LNCS 12346: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 794-812.
[34] KOLOTOUROS N, PAVLAKOS G, DANIILIDIS K. Conv-olutional mesh regression for single-image human shape reconstruction[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition , Long Beach, Jun 16-21, 2019. Piscataway: IEEE, 2019: 4501-4510.
[35] KOLOTOUROS N, PAVLAKOS G, Black M J, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 2252-2261.
[36] MOON G, LEE K M. I2L-MeshNet: image-to-lixel predi-ction network for accurate 3D human pose and mesh estimation from a single RGB image[C]//LNCS 12352:Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 752-768.
[37] XU X, CHEN H, MORENO-NOGUER F, et al. 3D human shape and pose from a single low-resolution image with self-supervised learning[C]//LNCS 12354: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 284-300.
[38] ZHANG H, TIAN Y, ZHOU X, et al. Pymaf: 3D human pose and shape regression with pyramidal mesh alignment feedback loop[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 11-17, 2021. Piscataway: IEEE, 2021: 11446-11456.
[39] ROGEZ G, WEINZAEPFEL P, SCHMID C. LCR-Net: loc-alization-classification-regression for human pose[C]//Proc-eedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Wash-ington: IEEE Computer Society, 2017: 3433-3441.
[40] ROGEZ G, WEINZAEPFEL P, SCHMID C. LCR-Net++: multi-person 2D and 3D pose detection in natural images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(5): 1146-1161.
[41] ZANFIR A, MARINOIU E, SMINCHISESCU C. Mon-ocular 3D pose and shape estimation of multiple people in natural scenes: the importance of multiple scene constraints[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 2148-2157.
[42] MOON G, CHANG J Y, LEE K M. Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 10133- 10142.
[43] WANG C, LI J, LIU W, et al. HMOR: hierarchical multi-person ordinal relations for monocular multi-person 3D pose estimation[C]//LNCS 12348: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 242-259.
[44] ZANFIR A, MARINOIU E, ZANFIR M, et al. Deep network for the integrated 3D sensing of multiple people in natural images[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montréal, Dec 3-8, 2018: 8420-8429.
[45] NIE X, FENG J, ZHANG J, et al. Single-stage multi-person pose machines[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6951-6960.
[46] FABBRI M, LANZI F, CALDERARA S, et al. Compressed volumetric heatmaps for multi-person 3D pose estimation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 7204-7213.
[47] CHENG Y, WANG B, YANG B, et al. Monocular 3D multi-person pose estimation by integrating top-down and bottom-up networks[C]//Proceedings of the 2021 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition Piscataway: IEEE, 2021: 7649-7659.
[48] ZHEN J, FANG Q, SUN J, et al. Smap: single-shot multi-person absolute 3D pose estimation[C]//LNCS 12360:Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 550-566.
[49] MEHTA D, SOTNYCHENKO O, MUELLER F, et al. Single-shot multi-person 3D pose estimation from monocular RGB[C]//Proceedings of the 2018 International Conference on 3D Vision, Verona, Sep 5-8, 2018. Washington: IEEE Computer Society, 2018: 120-130.
[50] MEHTA D, SOTNYCHENKO O, MUELLER F, et al. XNect: real-time multi-person 3D motion capture with a single RGB camera[J]. ACM Transactions on Graphics, 2020, 39(4): 82.
[51] LI S, KE L, PRATAMA K, et al. Cascaded deep monocular 3D human pose estimation with evolutionary training data[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 6173-6183.
[52] KUNDU J N, REVANUR A, WAGHMARE G V, et al. Unsupervised cross-modal alignment for multi-person 3D pose estimation[C]//LNCS 12358: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 35-52.
[53] CHEN X, LIN K Y, LIU W, et al. Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-21, 2019. Piscataway: IEEE, 2019: 10895-10904.
[54] ZHANG Y, AN L, YU T, et al. 4D association graph for realtime multi-person motion capture using multiple video cameras[C]//Proceedings of the 2020 IEEE/CVF Confer-ence on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 1324-1333.
[55] TU H, WANG C, ZENG W. VoxelPose: towards multi-camera 3D human pose estimation in wild environment[C]//LNCS 12346: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 197-212.
[56] HUANG C, JIANG S, LI Y, et al. End-to-end dynamic matching network for multi-view multi-person 3D pose estimation[C]//LNCS 12373: Proceedings of the 16th Euro-pean Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 477-493.
[57] DONG J, JIANG W, HUANG Q, et al. Fast and robust multi-person 3D pose estimation from multiple views[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-21, 2019. Piscataway: IEEE, 2019: 7792-7801.
[58] QIU H, WANG C, WANG J, et al. Cross view fusion for 3D human pose estimation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 4342-4351.
[59] XIE R, WANG C, WANG Y. MetaFuse: a pre-trained fus-ion model for human pose estimation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 13686-13695.
[60] CHEN H, GUO P, LI P, et al. Multi-person 3D pose estim-ation in crowded scenes based on multi-view geometry[C]//LNCS 12348: Proceedings of the 16th European Confe-rence on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 541-557.
[61] PAVLAKOS G, ZHOU X, DERPANIS K G, et al. Harves-ting multiple views for marker-less 3D human pose annota-tions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6988-6997.
[62] ZHANG Z, WANG C, QIU W, et al. AdaFuse: adaptive multiview fusion for accurate human pose estimation in the wild[J]. International Journal of Computer Vision, 2021, 129(3): 703-718.
[63] CHEN L, AI H, CHEN R, et al. Cross-view tracking for multi-human 3D pose estimation at over 100 FPS[C]//Pro-ceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 3279-3288.
[64] REMELLI E, HAN S, HONARI S, et al. Lightweight multi-view 3D pose estimation through camera-disentangled repr-esentation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 6040-6049.
[65] KOCABAS M, ATHANASIOU N, BLACK M J. VIBE: video inference for human body pose and shape estimation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 5253-5263.
[66] MAHMOOD N, GHORBANI N, TROJE N F, et al. AMASS: archive of motion capture as surface shapes[C]//Proceedings of the 2019 IEEE/CVF International Confer-ence on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscata-way: IEEE, 2019: 5442-5451.
[67] MITRA R, GUNDAVARAPU N B, SHARMA A, et al. Multiview-consistent semi-supervised learning for 3D human pose estimation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 6907-6916.
[68] IQBAL U, MOLCHANOV P, KAUTZ J. Weakly-supervised 3D human pose learning via multi-view images in the wild[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 5243-5252.
[69] WANDT B, RUDOLPH M, ZELL P, et al. CanonPose: self-supervised monocular 3D human pose estimation in the wild[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13294-13304.
[70] KOCABAS M, KARAGOZ S, AKBAS E. Self-supervised learning of 3D human pose using multi-view geometry[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-21, 2019. Piscataway: IEEE, 2019: 1077-1086.
[71] GONG K, ZHANG J, FENG J. PoseAug: a differentiable pose augmentation framework for 3D human pose estim-ation[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2021: 8575-8584.
[72] YU T, ZHENG Z, GUO K, et al. DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recog-nition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7287-7296.
[73] XIONG F, ZHANG B, XIAO Y, et al. A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 793-802.
[74] LI Z, YU T, PAN C, et al. Robust 3D self-portraits in seconds[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 1344-1353.
[75] ZHI T, LASSNER C, TUNG T, et al. TexMesh: recon-structing detailed human texture and geometry from RGB-D Video[C]//LNCS 12355: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 492-509.
[76] VON MARCARD T, HENSCHEL R, BLACK M J, et al. Recovering accurate 3D human pose in the wild using imus and a moving camera[C]//LNCS 11214: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 601-617.
[77] HUANG F, ZENG A, LIU M, et al. DeepFuse: an imu-aware network for real-time 3D human pose estimation from multi-view image[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, Mar 1-5, 2020. Piscataway: IEEE, 2020: 418-427.
[78] ZHANG Z, WANG C, QIN W, et al. Fusing wearable IMUs with multi-view images for human pose estimation: a geo-metric approach[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 2200-2209.
[79] MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]//Proceedings of the 2017 International Conf-erence on 3D Vision, Qingdao, Oct 10-12, 2017. Washing-ton: IEEE Computer Society, 2017: 506-516.
[80] CAO Z, GAO H, MANGALAM K, et al. Long-term human motion prediction with scene context[C]//LNCS 12346:Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer， 2020: 387-404.
[81] ZHU L, REMATAS K, CURLESS B, et al. Reconstructing NBA players[C]//LNCS 12350: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 177-194.
[82] REZAZADEH F, KOWSAR R, RAFIEE H, et al. Fermen-tation of Soybean meal improves growth performance and immune response of abruptly weaned Holstein calves during cold weather[J]. Animal Feed Science and Technology, 2019, 254: 114206.
[83] SIGAL L, BALAN A O, BLACK M J. HumanEva: synch-ronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International Journal of Computer Vision, 2010, 87(1): 4-27.
[84] JOO H, SIMON T, LI X L, et al. Panoptic studio: a massively multiview system for social interaction capture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(1): 190-204.
[85] LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection—a new baseline[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 6536-6545.
[86] BELAGIANNIS V, AMIN S, ANDRILUKA M, et al. 3D pictorial structures for multiple human pose estimation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 24-27, 2014. Washington: IEEE Computer Society, 2014: 1669-1676.
[87] XU J, YU Z, NI B, et al. Deep kinematics analysis for monocular 3D human pose estimation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 899-908.