计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (5): 991-1007.DOI: 10.3778/j.issn.1673-9418.2110022
收稿日期:
2021-10-13
修回日期:
2022-01-06
出版日期:
2022-05-01
发布日期:
2022-05-19
通讯作者:
+ E-mail: songzhen@zhongxi.cn作者简介:
杨刚(1977—),男,山西长治人,博士,副教授,CCF会员,主要研究方向为计算机图形学、虚拟现实等。基金资助:
YANG Gang1, ZHANG Yushu1, SONG Zhen2,+()
Received:
2021-10-13
Revised:
2022-01-06
Online:
2022-05-01
Published:
2022-05-19
About author:
YANG Gang, born in 1977, Ph.D., associate professor, member of CCF. His research interests include computer graphics, virtual reality, etc.Supported by:
摘要:
人体动作识别与动作评价是近年来的热点研究问题。两者在数据类型、数据处理、特征描述等方面有许多相通之处。近年来,随着应用需求的显著增长,出现了大量有关动作识别与评价的研究工作,但两者间的区别与联系,以及它们的理论方法和技术路线还未见系统的分析与总结。从应用目的与技术特点等方面出发,探讨了两者的联系,给出了两者较为明确的概念界定。在此基础上,从数据处理流程的角度出发,将动作识别与动作评价归纳到一个统一的技术框架中;依据此框架,对动作识别与评价所涉及到的各个重要环节,包括数据类型、预处理、特征描述、分类方法、评价方法等的研究进展和存在的问题进行了系统阐述。其中,在分类方法环节,将当前动作识别的分类方法划分为基于统计模型的方法和基于深度学习的方法进行论述;而在评价方法环节,则以专家知识介入方式为依据,将当前的动作评价相关工作划分为四类并进行了系统梳理。最后对当前存在的瓶颈及未来研究重点进行了总结与展望。
中图分类号:
杨刚, 张宇姝, 宋震. 人体动作识别与评价——区别、联系及研究进展[J]. 计算机科学与探索, 2022, 16(5): 991-1007.
YANG Gang, ZHANG Yushu, SONG Zhen. Human Action Recognition and Evaluation—Differences, Connections and Research Progress[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 991-1007.
数据集 | 年份 | 类别数 | 样本量 | 数据模态 | 主要内容 |
---|---|---|---|---|---|
UCF101[ | 2012 | 101 | 13 320 | RGB | BBC/ESPN广播电视频道和YouTube的一系列数据集 |
HMDB51[ | 2011 | 51 | 6 766 | RGB | 从各种互联网资源和数字视频中收集的人类日常行为 |
YouTube-8M | 2016 | 4 716 | 8 000 000 | RGB | 包含800万个YouTube视频,提供视频级别的注释并标记为4 800个知识图谱实体 |
MuHAVi[ | 2010 | 17 | 1 904 | RGB | 人类动作视频数据,包括手动标注的轮廓数据 |
ActivityNet | 2016 | 200 | 20 000 | RGB | 涵盖了200种不同的日常活动,共计约700 h的视频,平均每个视频上有1.5个动作标注 |
MSR Action 3D[ | 2010 | 20 | 567 | RGB+D&skeleton | 记录了20个动作,10个主体,每个对象执行每个动作2~3次,共有567个深度图序列,分辨率为640×240像素 |
NTU RGB+D[ | 2016 | 60 | 56 000 | RGB&RGB+D&skeleton | 主要分为3类:(1)日常行为;(2)医疗卫生相关行为;(3)两人互动 |
NTU RGB+D 120[ | 2019 | 120 | 114 480 | RGB&RGB+D&skeleton | 主要分为3类:(1)日常行为;(2)医疗卫生相关行为;(3)两人互动 |
G3D[ | 2012 | 20 | 10 | RGB&RGB+D&skeleton | 包含一系列使用 Microsoft Kinect 捕获的游戏动作 |
表1 常用的公开动作识别数据集
Table 1 Commonly used publicly available action recognition datasets
数据集 | 年份 | 类别数 | 样本量 | 数据模态 | 主要内容 |
---|---|---|---|---|---|
UCF101[ | 2012 | 101 | 13 320 | RGB | BBC/ESPN广播电视频道和YouTube的一系列数据集 |
HMDB51[ | 2011 | 51 | 6 766 | RGB | 从各种互联网资源和数字视频中收集的人类日常行为 |
YouTube-8M | 2016 | 4 716 | 8 000 000 | RGB | 包含800万个YouTube视频,提供视频级别的注释并标记为4 800个知识图谱实体 |
MuHAVi[ | 2010 | 17 | 1 904 | RGB | 人类动作视频数据,包括手动标注的轮廓数据 |
ActivityNet | 2016 | 200 | 20 000 | RGB | 涵盖了200种不同的日常活动,共计约700 h的视频,平均每个视频上有1.5个动作标注 |
MSR Action 3D[ | 2010 | 20 | 567 | RGB+D&skeleton | 记录了20个动作,10个主体,每个对象执行每个动作2~3次,共有567个深度图序列,分辨率为640×240像素 |
NTU RGB+D[ | 2016 | 60 | 56 000 | RGB&RGB+D&skeleton | 主要分为3类:(1)日常行为;(2)医疗卫生相关行为;(3)两人互动 |
NTU RGB+D 120[ | 2019 | 120 | 114 480 | RGB&RGB+D&skeleton | 主要分为3类:(1)日常行为;(2)医疗卫生相关行为;(3)两人互动 |
G3D[ | 2012 | 20 | 10 | RGB&RGB+D&skeleton | 包含一系列使用 Microsoft Kinect 捕获的游戏动作 |
类别 | 方法分类 | 相关工作与方法 | 优缺点 | |
---|---|---|---|---|
基于统计模型的方法 | 模板匹配法 | ASM、AAM、MHI、MEI基于二维网格模板特征的匹配方法DTW(动态时间规整算法) | 实现简单,计算复杂度低,但精度低,鲁棒性差 | |
状态空间法 | HMMs | HMMs HHMMsS-HSMM基于多尺度特征的双层隐马尔可夫模型 | 精度较高,但鲁棒性差,计算复杂度高 | |
DBN | Du等人[ | 精度较高,计算复杂度较低,但设计复杂度高,鲁棒性差 | ||
支持向量机法 | Pontil等人[ | 精度高,设计复杂度低,但鲁棒性差,对大规模训练样本难以实施 | ||
基于深度学习的方法 | CNN | Mohamed等人[ | 精度非常高,鲁棒性强,处理高维数据能力强,但计算复杂度高,需要调参数 | |
双流网络 | Simonyan等人[ | 精度非常高,鲁棒性强,但计算复杂度高,速度慢 | ||
CNN-LSTM结构 | Donahue等人[ LRCNUnsupervised+LSTM(无监督的LSTM模型) LSCN[ | 精度非常高,鲁棒性强,且计算速度快 |
表2 动作分类方法总结
Table 2 Summary of action classification methods
类别 | 方法分类 | 相关工作与方法 | 优缺点 | |
---|---|---|---|---|
基于统计模型的方法 | 模板匹配法 | ASM、AAM、MHI、MEI基于二维网格模板特征的匹配方法DTW(动态时间规整算法) | 实现简单,计算复杂度低,但精度低,鲁棒性差 | |
状态空间法 | HMMs | HMMs HHMMsS-HSMM基于多尺度特征的双层隐马尔可夫模型 | 精度较高,但鲁棒性差,计算复杂度高 | |
DBN | Du等人[ | 精度较高,计算复杂度较低,但设计复杂度高,鲁棒性差 | ||
支持向量机法 | Pontil等人[ | 精度高,设计复杂度低,但鲁棒性差,对大规模训练样本难以实施 | ||
基于深度学习的方法 | CNN | Mohamed等人[ | 精度非常高,鲁棒性强,处理高维数据能力强,但计算复杂度高,需要调参数 | |
双流网络 | Simonyan等人[ | 精度非常高,鲁棒性强,但计算复杂度高,速度慢 | ||
CNN-LSTM结构 | Donahue等人[ LRCNUnsupervised+LSTM(无监督的LSTM模型) LSCN[ | 精度非常高,鲁棒性强,且计算速度快 |
方法类别 | 相关工作 | 评价对象 | 标准/方法 |
---|---|---|---|
动作评价的可视化工具 | 陈学梅[ | 高尔夫挥杆动作 | 关节角度 |
李奎[ | 羽毛球挥拍动作 | 切比雪夫距离 | |
王台瑞[ | 京剧 | 专家和机器分别打分 | |
在特征描述中引入专家知识 | 陈学梅[ | 高尔夫挥杆动作 | 关节角度 |
Zhang等人[ | 竞技健美操 | 人体动力学 | |
Alexiadis等人[ | 舞蹈 | 四元数特征 | |
Patrona等人[ | 医疗训练 | 动态加权、动能描述符 | |
基于专家知识制定动作规范 | 李睿敏[ | 发展性协调障碍症 | 基于时域滤波的CNN |
Richter等人[ | 髋外展、髋伸展和髋弯曲 | 基于规则和标签 | |
徐铮[ | 24式太极拳 | CCA | |
基于大数据的动作评价 | 吕默等人[ | 体操 | 大数据 |
表3 动作评价方法总结
Table 3 Summary of action evaluation methods
方法类别 | 相关工作 | 评价对象 | 标准/方法 |
---|---|---|---|
动作评价的可视化工具 | 陈学梅[ | 高尔夫挥杆动作 | 关节角度 |
李奎[ | 羽毛球挥拍动作 | 切比雪夫距离 | |
王台瑞[ | 京剧 | 专家和机器分别打分 | |
在特征描述中引入专家知识 | 陈学梅[ | 高尔夫挥杆动作 | 关节角度 |
Zhang等人[ | 竞技健美操 | 人体动力学 | |
Alexiadis等人[ | 舞蹈 | 四元数特征 | |
Patrona等人[ | 医疗训练 | 动态加权、动能描述符 | |
基于专家知识制定动作规范 | 李睿敏[ | 发展性协调障碍症 | 基于时域滤波的CNN |
Richter等人[ | 髋外展、髋伸展和髋弯曲 | 基于规则和标签 | |
徐铮[ | 24式太极拳 | CCA | |
基于大数据的动作评价 | 吕默等人[ | 体操 | 大数据 |
[1] |
DURIC Z, GRAY W, HEISHMAN R, et al. Integrating perce-ptual and cognitive modeling for adaptive and intelligent human-computer interaction[J]. Proceedings of the IEEE, 2002, 90(7): 1272-1289.
DOI URL |
[2] | KWAK S, HAN B, HAN J H. Scenario-based video event recognition by constraint flow[C]// Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recog-nition, Colorado Springs, Jun 20-25, 2011. Washington: IEEE Computer Society, 2011: 3345-3352. |
[3] | GAUR U, ZHU Y, SONG B, et al. A “string of feature graphs” model for recognition of complex activities in natural videos[C]// Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Nov 6-13, 2011. Washington: IEEE Computer Society, 2011: 2595-2602. |
[4] | PARK S, AGGARWAL J K. Recognition of two-person inte-ractions using a hierarchical Bayesian network[C]// Proceedings of the 2003 ACM SIGMM International Workshop on Video Surveillance. New York: ACM, 2003: 65-76. |
[5] |
JUNEJO I, DEXTER E, LAPTEV I, et al. View-independent action recognition from temporal self-similarities[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 172-185.
DOI URL |
[6] | THANGALI A, NASH J P, SCLAROFF S, et al. Exploiting phonological constraints for handshape inference in ASL video[C]// Proceedings of the 24th IEEE Conference on Com-puter Vision and Pattern Recognition, Colorado Springs, Jun 20-25, 2011. Washington: IEEE Computer Society, 2011: 521-528. |
[7] | 樊景超, 周国民. 基于Kinect骨骼跟踪技术的手势识别研究[J]. 安徽农业科学, 2014, 42(11): 3444-3446. |
FAN J C, ZHOU G M. The research of gesture recognition based on kinect skeleton tracking technology[J]. Journal of Anhui Agricultural Sciences, 2014, 42(11): 3444-3446. | |
[8] | COOPER H, BOWDEN R. Large lexicon detection of sign language[C]// LNCS 4796: Proceedings of the 2007 IEEE International Workshop on Human-Computer Interaction, Rio de Janeiro, Oct 20, 2007. Berlin, Heidelberg: Springer, 2007: 88-97. |
[9] | CHANG Y J, CHEN S F, HUANG J D. A kinect-based sys-tem for physical rehabilitation: a pilot study for young adults with motor disabilities[J]. Research in Developmental Disabi-lities, 2011, 32: 2566-2570. |
[10] | 李少波. 机器人的人体姿态动作识别与模仿算法[D]. 上海: 上海交通大学, 2013. |
LI S B. Algorithm of human posture action recognition and imitation for robots[D]. Shanghai: Shanghai Jiaotong University, 2013. | |
[11] | REHG J M, ABOWD G D, ROZGA A, et al. Decoding children’s social behavior[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Soc-iety, 2013: 3414-3421. |
[12] | PRESTI L L, SCLAROFF S, ROZGA A. Joint alignment and modeling of correlated behavior streams[C]// Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 730-737. |
[13] |
JOHANSSON G. Visual perception of biological motion and a model for its analysis[J]. Perception & Psychophysics, 1973, 14: 201-211.
DOI URL |
[14] | 陈学梅. 基于人体三维姿态的动作评价系统[D]. 杭州: 浙江大学, 2018. |
CHEN X M. An action evaluating system based on 3D human posture[D]. Hangzhou: Zhejiang University, 2018. | |
[15] | 李奎. 羽毛球运动员挥拍动作的捕捉、识别与分析[D]. 成都: 电子科技大学, 2017. |
LI K. Capture, recognition and analysis of badminton player’s swing[D]. Chengdu: University of Electronic Science and Technology of China, 2017. | |
[16] | 吕默, 万连城. 基于大数据和动作识别算法的体育竞赛辅助评审系统设计[J]. 电子设计工程, 2019, 27(16): 6-10. |
LV M, WAN L C. Design of sports competition aided evalua-tion system based on big data and motion recognition algo-rithm[J]. Electronic Design Engineering, 2019, 27(16): 6-10. | |
[17] | KAO C I, SPIRO I, SEUNGKYU L, et al. Dancing with turks[C]// Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, Brisbane, Oct 26-30, 2015. New York: ACM, 2015: 241-250. |
[18] | SCOTT J, COLLINS R, FUNK C, et al. 4D model-based spatio-temporal alignment of scripted Taiji Quan sequences[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 795-804. |
[19] | 王台瑞. 以三维撷取探索戏曲动作教与学之异同[J]. 艺术教育研究, 2018(35): 69-92. |
WANG T J. USING 3D motion capture study Chinese opera performance movements[J]. Research in Arts Education, 2018(35): 69-92. | |
[20] | 徐光祐, 曹媛媛. 动作识别与行为理解综述[J]. 中国图象图形学报, 2009, 14(2): 189-195. |
XU G Y, CAO Y Y. Action recognition and activity under-standing: a review[J]. Journal of Image and Graphics, 2009, 14(2): 189-195. | |
[21] | WU D, SHARMA N, BLUMENSTEIN M. Recent advances in video-based human action recognition using deep learning: a review[C]// Proceedings of the 2017 International Joint Con-ference on Neural Networks, Anchorage, May 14-19, 2017. Piscataway: IEEE, 2017: 2865-2872. |
[22] |
PRESTI L L, MARCO L C. 3D skeleton-based human action classification: a survey[J]. Pattern Recognition, 2016, 53: 130-147.
DOI URL |
[23] | 黄国范, 李亚. 人体动作姿态识别综述[J]. 电脑知识与技术, 2013, 9(1): 133-135. |
HUANG G F, LI Y. A survey of human action and pose recognition[J]. Computer Knowledge and Technology, 2013, 9(1): 133-135. | |
[24] | 田元, 李方迪. 基于深度信息的人体姿态识别研究综述[J]. 计算机工程与应用, 2020, 56(4): 1-8. |
TIAN Y, LI F D. Research review on human body gesture recognition based on depth data[J]. Computer Engineering and Applications, 2020, 56(4): 1-8. | |
[25] | 黄晴晴, 周风余, 刘美珍. 基于视频的人体动作识别算法综述[J]. 计算机应用研究, 2020, 37(11): 3213-3219. |
HUANG Q Q, ZHOU F Y, LIU M Z. Survey of human action recognition algorithms based on video[J]. Application Research of Computers, 2020, 37 (11): 3213-3219. | |
[26] |
PATRONA F, CHATZITOFIS A, ZARPALAS D, et al. Mo-tion analysis: action detection, recognition and evaluation based on motion capture data[J]. Pattern Recognition, 2018, 76: 612-622.
DOI URL |
[27] | SOOMRO K, ZAMIR A R, SHAH M, et al. UCF101: a dataset of 101 human actions classes form video vision the wild[J]. arXiv:1212.0402, 2012. |
[28] | KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: a large video database for human motion recognition[C]// Proceedings of the 2011 International Conference on Com-puter Vision, Barcelona, Nov 6-13, 2011. Washington: IEEE Computer Society, 2011: 2556-2563. |
[29] | SINGH S, VELASTIN S A, RAGHEB H. MuHAVi: a multi-camera human action video dataset for the evaluation of action recognition methods[C]// Proceedings of the 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, Washington, Aug 29-Sep 1, 2010. Was-hington: IEEE Computer Society, 2010: 48-55. |
[30] | LI W Q, ZHANG Z Y, LIU Z C, et al. Action recognition based on a bag of 3D points[C]// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Was-hington: IEEE Computer Society, 2010: 9-14. |
[31] | SHAHROUDY A, LIU J, TIAN-TSONG N G, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 1010-1019. |
[32] | LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity unders-tanding[J]. IEEE Transactions on Pattern Analysis and Mac-hine Intelligence, 2019, 42: 2684-2701. |
[33] | BLOOM V, MAKRIS D, ARGYRIOU V. G3D:a gaming action dataset and real time action recognition evaluation framework[C]// Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, Providence, Jun 16-21, 2012. Washington: IEEE Com-puter Society, 2012: 7-12. |
[34] |
DABOV K, FOI A, KATKOVNIK V, et al. Image denoising by sparse 3D transform-domain collaborative filtering[J]. IEEE Transactions on Image Processing, 2007, 16: 2080-2095.
DOI URL |
[35] | MAGGIONI M, BORACCHI G, FOI A. Video denoising using separable 4D nonlocal spatiotemporal transforms[J]. Procee-dings of SPIE-The International Society for Optical Enginee-ring, 2011, 7870(3): 1-12. |
[36] | DAVY A, EHRET T, MOREL J M. et al. Non-local video denoising by CNN[J]. arXiv:1811.12758, 2018. |
[37] | ARIAS P, MOREL J M. Video denoising via empirical Baye-sian estimation of space-time patches[J]. Journal of Mathe-matical Imaging & Vision, 2018, 60(1): 70-93. |
[38] | TASSANO M, DELON J, VEIT T. DVDNet: a fast network for deep video denoising[C]// Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei, China, Sep 22-25, 2019. Piscataway: IEEE, 2019: 1805-1809. |
[39] | TASSANO M, DELON J, VEIT T. FastDVDnet: towards real-time deep video denoising without flow estimation[J]. arXiv:1907.01361v2, 2019. |
[40] | PING W, ZHENG N, ZHAO Y, et al. Concurrent action detection with structural prediction[C]// Proceedings of the 2013 International Conference on Computer Vision, Sydney, Dec 1-8, 2013. Washington: IEEE Computer Society, 2013: 3136-3143. |
[41] | WU D, SHAO L. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 724-731. |
[42] | WANG C, WANG Y, YUILLE A L. An approach to pose-based action recognition[C]// Proceedings of the 2013 Confe-rence on Computer Vision and Pattern Recognition, Portland, Jun 23-28, 2013. Washington: IEEE Computer Society, 2013: 915-922. |
[43] |
SEDMIDUBSKY J, ELIAS P, BUDIKOVA P, et al. Content-based management of human motion data: survey and challe-nges[J]. IEEE Access, 2021, 9: 64241-64255.
DOI URL |
[44] |
OJALA T, PIETIKÄINEN M, MÄENPÄÄ T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24: 971-987.
DOI URL |
[45] | 唐灿, 唐亮贵, 刘波. 图像特征检测与匹配方法研究综述[J]. 南京信息工程大学学报, 2020, 12(3): 261-273. |
TANG C, TANG L G, LIU B. A survey of image feature detecting and matching methods[J]. Journal of Nanjing Uni-versity of Information Science & Technology, 2020, 12(3): 261-273. | |
[46] | BOBICK A F, DAVIS J W. The recognition of human move-ment using temporal templates[J]. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 2001, 23: 257-267. |
[47] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition, San Diego, Jun 20-25, 2005. Washington: IEEE Com-puter Society, 2005: 886-893. |
[48] |
LAPTEV I. On space-time interest points[J]. International Journal of Computer Vision, 2005, 64(2/3): 107-123.
DOI URL |
[49] | LAPTEV I, MARSZALEK M, SCHMID C, et al. Learning realistic human actions from movies[C]// Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, Jun 24-26, 2008. Was-hington: IEEE Computer Society, 2008: 1-8. |
[50] | WANG H, ULLAH M M, KLÄSER A, et al. Evaluation of local spatio-temporal features for action recognition[C]// Procee-dings of the 2009 British Machine Vision Conference, London, Sep 7-10, 2009. London: The British Machine Vision Asso-ciation, 2009: 1-11. |
[51] | BAUMANN J, WESSEL R, KRÜGER B, et al. Action graph a versatile data structure for action recognition[C]// Procee-dings of the 9th International Conference on Computer Gra-phics Theory and Applications, Lisbon, Jan 5-8, 2014: 325-334. |
[52] |
BARNACHON M, BOUAKAZ S, BOUFAMA B, et al. A real-time system for motion retrieval and interpretation[J]. Pattern Recognition Letters, 2013, 34(15): 1789-1798.
DOI URL |
[53] |
MASOOD S Z, MASOOD S Z, TAPPEN M F, et al. Explo-ring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013, 101(3): 420-436.
DOI URL |
[54] |
MÜLLER M, RÖDER T, CLAUSEN M. Efficient content-based retrieval of motion capture data[J]. ACM Transactions on Graphics, 2005, 24(3): 677-685.
DOI URL |
[55] | CHERON G, LAPTEV I, SCHMID C. P-CNN: pose-based CNN features for action recognition[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 3218-3226. |
[56] | YAN S, XIONG Y, LIN D, et al. Spatial temporal graph convo-lutional networks for skeleton-based action recognition[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 7444-7452. |
[57] | REN B, LIU M, DING R, et al. A survey on 3D skeleton-based action recognition using learning method[J]. arXiv:2002.05907, 2020. |
[58] | ZHANG P, XUE J, LAN C, et al. Adding attentiveness to the neurons in recurrent neural networks[C]// LNCS 11213: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 136-152. |
[59] | LANITIS A, TAYLOR C J, COOTES T F. Automatic interpre-tation and coding of face images using flexible models[J]. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 1997, 19(7): 743-756. |
[60] |
COOTES T F, EDWARDS G J, TAYLOR C J. Active appea-rance models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6): 681-685.
DOI URL |
[61] | BOBICK A F, WILSON A D. A state-based approach to the representation and recognition of gesture[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1997, 19(12): 1325-1337. |
[62] | YAMATO J, OHYA J, ISHII K. Recognizing human action in time sequential images using hidden Markov model[C]// Proceedings of the 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, Jun 15-18, 1992. Washington: IEEE Computer Society, 1992: 379-385. |
[63] | NGUYEN N T, PHUNG D Q, VENKATESH S, et al. Lear-ning and detecting activities from movement trajectories using the hierachical hidden Markov model[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, Jun 20-26, 2005. Washin-gton: IEEE Computer Society, 2005: 955-960. |
[64] | 梅雪, 胡石, 许松松, 等. 基于多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J]. 智能系统学报, 2012, 7(6): 512-517. |
MEI X, HU S, XU S S, et al. Multi-scale feature based double-layer HMM and its application in behavior recognition[J]. CAAI Transactions on Intelligent Systems, 2012, 7(6): 512-517. | |
[65] | DU Y T, CHEN F, XU W L, et al. Recognizing interaction activities using dynamic Bayesian network[C]// Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China, Aug 20-24, 2006. Washington: IEEE Com-puter Society, 2006: 618-621. |
[66] | OLIVER N, HORVITZ E. A comparison of HMMs and dynamic Bayesian networks for recognizing office activities[C]// LNCS 3538: Proceedings of the 10th International Confe-rence on User Modeling, Edinburgh, Jul 24-29, 2005. Berlin, Heidelberg: Springer, 2005: 199-209. |
[67] | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. |
ZHOU Z H. Machine learning[M]. Beijing: Tsinghua Univer-sity Press, 2016. | |
[68] | 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012. |
LI H. Statistical learning methods[M]. Beijing: Tsinghua Uni-versity Press, 2012. | |
[69] |
PONTIL M, VERRI A. Support vector machines for 3D object recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(6): 637-646.
DOI URL |
[70] | MANZI A, CAVALLO F, DARIO P. A 3D human posture approach for activity recognition based on depth camera[C]// LNCS 9914: Proceedings of the 14th European Confe-rence on Computer Vision, Amsterdam, Oct 8-10, 15-16, 2016. Cham: Springer, 2016: 432-447. |
[71] | SCHÜLDT C, LAPTEV I, CAPUTO B. Recognizing human actions: a local SVM approach[C]// Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, Aug 23-26, 2004. Washington: IEEE Computer Society, 2004: 32-36. |
[72] | MOHAMED E, ISMAIL C, WASSIM B, et al. Posture recog-nition using an RGB-D camera: exploring 3D body modeling and deep learning approaches[C]// Proceedings of the 2018 IEEE Life Sciences Conference, Montreal, Oct 28-30, 2018. Piscataway: IEEE, 2018: 69-72. |
[73] | 刘锁兰, 顾嘉晖, 王洪元, 等. 基于关联分区和ST-GCN的人体行为识别[J]. 计算机工程与应用, 2021, 57(13): 168-175. |
LIU S L, GU J H, WANG H Y, et al. Human behavior reco-gnition based on associative partition and ST-GCN[J]. Com-puter Engineering and Applications, 2021, 57(13): 168-175. | |
[74] |
JI S W, XU W, YANG M, et al. 3D convolutional neural net-works for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231.
DOI URL |
[75] | 李元祥, 谢林柏. 基于深度运动图和密集轨迹的行为识别算法[J]. 计算机工程与应用, 2020, 56(3): 194-200. |
LI Y X, XIE L B. Human action recognition based on depth motion map and dense trajectory[J]. Computer Engineering and Applications, 2020, 56(3): 194-200. | |
[76] | SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]// Advances in Neural Information Processing Systems 27: Annual Confe-rence on Neural Information Processing Systems 2014, Mont-real, Dec 8-13, 2014: 568-576. |
[77] | FEICHTEMHOFER C, PINZ A, ZISSERMAN A, et al. Con-volutional two-stream network fusion for video action recog-nition[C]// Proceedings of the 2016 IEEE Conference on Com-puter Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 1933-1941. |
[78] | 石祥滨, 李怡颖, 刘芳, 等. T-STAM: 基于双流时空注意力机制的端到端的动作识别模型[J]. 计算机应用研究, 2020, 38(3): 1235-1239. |
SHI X B, LI Y Y, LIU F, et al. T-STAM: end-to-end action recognition model based on two-stream network with spatio-temporal attention mechanism[J]. Application Research of Computers, 2020, 38(3): 1235-1239. | |
[79] | DONAHUE J, HENDRICKSL A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recog-nition and description[J]. IEEE Transactions on Pattern Ana-lysis and Machine Intelligence, 2017, 39(4): 677-691. |
[80] | 杨珂, 王敬宇, 戚琦, 等. LSCN: 一种用于动作识别的长短时序关注网络[J]. 电子学报, 2020, 48(3): 503-509. |
YANG K, WANG J Y, QI Q, et al. LSCN: concerning long and short sequence together for action recognition[J]. Acta Electronica Sinica, 2020, 48(3): 503-509. | |
[81] | PENG W, HONG X P, CHEN H Y, et al. Learning graph con-volutional network for skeleton-based human action recog-nition by neural searching[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 2669-2676. |
[82] | 张晓莹, 刘莉, 赵轩立. 竞技健美操难度动作C289不同技术特征的运动学分析[J]. 北京体育大学学报, 2017, 40(10): 99-105. |
ZHANG X Y, LIU L, ZHAO X L. Kinematic analysis on dif-ferent technical characteristics of C289 in aerobic gymnastics[J]. Journal of Beijing Sport University, 2017, 40(10): 99-105. | |
[83] |
ALEXIADIS D S, DARAS P. Quaternionic signal processing techniques for automatic evaluation of dance performances from MoCap data[J]. IEEE Transactions on Multimedia, 2014, 16(5): 1391-1406.
DOI URL |
[84] | 李睿敏. 基于视觉数据的人体动作精细分类及评估方法研究[D]. 北京: 中国科学院大学, 2020. |
LI R M. Research on fine classification and evaluation of human action based on visual data[D]. Beijing: University of Chinese Academy of Sciences, 2020. | |
[85] | RICHTER J, WIEDE C, HEINKEL U, et al. Motion evalua-tion of therapy exercises by means of skeleton normalisation, incremental dynamic time warping and machine learning: a comparison of a rule-based and a machine-learning-based approach[C]// Proceedings of VISIGRAPP 14th International Conference on Computer Vision Theory and Applications, Prague, Feb 25-27, 2019: 497-504. |
[86] | 徐铮. 基于全身动捕的太极拳辅助教学与评价方法[D]. 郑州: 郑州大学, 2018. |
XU Z. Taiji boxing assist teaching and evaluation method based on whole body motion capture[D]. Zhengzhou: Zheng-zhou University, 2018. |
[1] | 钱慧芳, 易剑平, 付云虎. 基于深度学习的人体动作识别综述[J]. 计算机科学与探索, 2021, 15(3): 438-455. |
[2] | 孙冬璞, 曲丽. 时间序列特征表示与相似性度量研究综述[J]. 计算机科学与探索, 2021, 15(2): 195-205. |
[3] | 张儒鹏,于亚新,张康,刘梦,尚祖强. 基于OI-LSTM神经网络结构的人类动作识别模型研究[J]. 计算机科学与探索, 2018, 12(12): 1926-1939. |
[4] | 黄菲菲,曹江涛,姬晓飞,王佩瑶. 多特征的双人交互动作识别算法研究[J]. 计算机科学与探索, 2017, 11(2): 294-302. |
[5] | 何豪杰,桂彦,李峰. 具有重复场景元素的复杂自然图像颜色编辑[J]. 计算机科学与探索, 2017, 11(11): 1792-1803. |
[6] | 杨弄影,李峰,桂彦. 纹理图像中重复纹理元素提取方法[J]. 计算机科学与探索, 2016, 10(8): 1154-1165. |
[7] | 庞俊,谷峪,许嘉,于戈. 相似性连接查询技术研究进展[J]. 计算机科学与探索, 2013, 7(1): 1-13. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||