[1] GUO C, ZUO X, WANG S, et al. Action2motion: condi-tioned generation of 3D human motions[C]//Proceedings of the 28th ACM International Conference on Multimedia, Seattle, Oct 12-16, 2020. New York: ACM, 2020: 2021-2029.
[2] CHUNG J, GULCEHRE C, CHO K, et al. Empirical eval-uation of gated recurrent neural networks on sequence modeling[J]. arXiv:1412.3555, 2014.
[3] PETROVICH M, BLACK M J, VAROL G. Action-condi-tioned 3D human motion synthesis with transformer VAE[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 11-17, 2021. Piscataway: IEEE, 2021: 10965-10975.
[4] LI R, YANG S, DAVID A R, et al. AI choreographer: music conditioned 3D dance generation with AIST++[C]//Proce-edings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 11-17, 2021. Piscataway: IEEE, 2021: 13381-13392.
[5] KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Pisc-ataway: IEEE, 2018: 7122-7131.
[6] KOCABAS M, ATHANASIOU N, BLACK M J. Vibe: video inference for human body pose and shape estimation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 5252-5262.
[7] MAHMOOD N, GHORBANI N, TROJE N F, et al. AMASS: archive of motion capture as surface shapes[C]//Proce-edings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 5441-5450.
[8] LUO Z, GOLESTANEH S A, KITANI K M. 3D human motion estimation via motion compression and refinement[C]//Proceedings of the 15th Asian Conference on Computer Vision, Kyoto, Nov 30-Dec 4, 2020. Cham: Springer, 2020: 324-340.
[9] KINGMA D P, WELLING M. Auto-encoding variational Bayes[J]. arXiv:1312.6114, 2013.
[10] HUA G G, LI L H, LIU S G. Multipath affinage stacked-hourglass networks for human pose estimation[J]. Frontiers of Computer Science, 2020, 14(4): 144701.
[11] LIU S, LI Y, HUA G. Human pose estimation in video via structured space learning and halfway temporal evaluation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(7): 2029-2038.
[12] 吉斌, 潘烨, 金小刚, 等. 用于视频流人体姿态估计的时空信息感知网络[J]. 计算机辅助设计与图形学学报, 2022, 34(2): 189-197.
JI B, PAN Y, JIN X G, et al. Spatiotemporal neural network for video-based pose estimation[J]. Journal of Computer-Aided Design and Computer Graphics, 2022, 34(2): 189-197.
[13] BARSOUM E, KENDER J, LIU Z. HP-GAN: probabilistic 3D human motion prediction via GAN[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Pisca-taway: IEEE, 2018: 1499-1508.
[14] DUAN Y, SHI T, ZOU Z, et al. Single-shot motion comp-letion with transformer[J]. arXiv:2103.00776v1, 2021.
[15] ASHISH V, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Cambridge: MIT Press, 2017: 5999-6009.
[16] ZHANG Y, BLACK M J, TANG S. We are more than our joints: predicting how 3D bodies move[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, Jun 19-25, 2021. Piscataway: IEEE, 2021: 3371-3381.
[17] HYEMIN A, HA T, CHOI Y, et al. Text2action: generative adversarial synthesis from language to action[C]//Proce-edings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Pisc-ataway: IEEE, 2018: 5915-5920.
[18] AHUJA C, MORENCY L P. Language2pose: natural language grounded pose forecasting[C]//Proceedings of the 2019 International Conference on 3D Vision, Queacutebec City, Sep 16-19, 2019. Piscataway: IEEE, 2019: 719-728.
[19] LEE H Y, YANG X, LIU M Y, et al. Dancing to music[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019. Cambridge: MIT Press, 2019: 1-11.
[20] LI J, YIN Y, CHU H, et al. Learning to generate diverse dance motions with transformer[J]. arXiv:2008.08171, 2020.
[21] LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 1-16.
[22] ZHOU Y, BARNES C, LU J, et al. On the continuity of rotation representations in neural networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Pisc-ataway: IEEE, 2019: 5738-5746.
[23] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of North American Chapter of the Association for Compu-tational Linguistics, Minneapolis, Jun 2-7, 2019. Cambridge: MIT Press, 2019: 4171-4186.
[24] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recog-nition at scale[J]. arXiv:2010.11929, 2020.
[25] LE F, ZENG T, LIU C, et al. Transformer-based conditional variational autoencoder for controllable story generation[J]. arXiv:2101.00828, 2021.
[26] VASA L, SKALA V. A perception correlated comparison method for dynamic meshes[J]. IEEE Transactions on Visualization and Computer Graphics, 2010, 17(2): 220-230.
[27] JI Y, XU F, YANG Y, et al. A large-scale RGB-D database for arbitrary-view human action recognition[C]//Proceedings of the 2018 ACM International Conference on Multimedia, Seoul, Oct 22-26, 2018. New York: ACM, 2018: 1510-1518. |