Action Recognition Method on Regional Association Adaptive Graph Convolution

doi:10.3778/j.issn.1673-9418.2010070

Abstract

Abstract:

Action recognition methods based on skeleton data have received extensive attention and research due to their strong adaptability to dynamic environments and complex backgrounds. The application of graph convo-lutional networks to describe human skeleton to realize human action recognition can achieve good recognition results, but the topological structure of the graph is often manually set, and the structure on all layers and input samples is fixed. Also the graph convolutional networks can only capture the local physical relationship between joints, and miss the correlation of non-physical joints. This paper proposes a new skeleton action recognition based on regional association adaptive graph convolutional network. Through adaptive graph convolution, the structure of the parameterized global graph and the single data graph and model convolution parameters are trained and updated in different layers, increasing the flexibility of the graph structure in the model and the versatility of the model for various data samples. This paper introduces the regional association graph convolution, and the non-physical conne-ction correlation of each joint between data frames is captured by alternating information transfer between joint features and connection features. And it adds the second-order data of the skeleton to supplement the original joint data, merges this two to form a two-stream network to improve the performance of the recognition network. Exper-iments on the NTU-RGBD large-scale dataset show that the model has a certain improvement in the accuracy of action recognition.

Key words: adaptation, regional association, two-stream network, graph convolution

摘要：

基于骨架数据的动作识别方法由于其对动态环境和复杂背景的强适应性而受到广泛的关注和研究,应用图卷积网络描述人体骨架实现人体动作识别可以取得很好的识别效果,但实现过程中图的拓扑结构通常是手动设置的,且在所有层和输入样本上的结构固定,只能捕获关节之间的局部物理关系,会遗漏非物理连接的关节相关性。提出了一种新的基于区域关联自适应图卷积网络的骨架动作识别,通过自适应图卷积使参数化的全局图和单个数据图的结构与模型卷积参数在不同的层中分别进行训练和更新,增加了模型中图形构造的灵活性与模型对于各种数据样本的通用性。同时引入区域关联图卷积,通过在关节特征与连接特征之间交替信息传递来捕获数据帧间各关节的非物理连接相关性。并加入骨骼的二阶数据对原有关节数据进行信息补充,融合两者构成双流网络提升识别网络的性能。在NTU-RGBD大规模数据集上的实验表明,该模型在动作识别的准确率上有了一定的提升。

关键词: 自适应, 区域关联, 双流网络, 图卷积

CLC Number:

TP391

MA Li, ZHENG Shiyu, NIU Bin. Action Recognition Method on Regional Association Adaptive Graph Convolution[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 898-908.

马利, 郑诗雨, 牛斌. 应用区域关联自适应图卷积的动作识别方法[J]. 计算机科学与探索, 2022, 16(4): 898-908.

Figures/Tables 16

Fig.1 Time-space skeleton diagram of ST-GCN

Fig.2 Module structure of AG C k

Fig.3 Module structure of RAGC

Fig.4 Module structure of RA-AGC

Fig.5 Module structure of RA-AGCN

Fig.6 Two-stream network structure

Fig.7 NTU-RGBD joint natural connection definition

Table 1 Research on effectiveness of A kand D k in adaptive graph convolution %

方法	Top-1	Top-5
RA-AGCN（joint）- $A k$	93.52	99.20
RA-AGCN（joint）- $D k$	92.29	99.01
RA-AGCN（joint）	94.82	99.37

Table 1 Research on effectiveness of A kand D k in adaptive graph convolution %

方法	Top-1	Top-5
RA-AGCN（joint）- $A k$	93.52	99.20
RA-AGCN（joint）- $D k$	92.29	99.01
RA-AGCN（joint）	94.82	99.37

Fig.8 Comparative study on effectiveness of adaptive graph convolution

Fig.9 Virtual representation of regional correlation strength of different actions

Fig.10 Visual representation of regional correlation strength

Table 2 Research on importance of regional association graph convolution %

方法	Top-1	Top-5
AGCN（bone）	85.98	97.42
RA-AGCN（bone）	93.21	99.22

Fig.11 Comparative study on effectiveness of regional association graph convolution

Table 3 Research on importance of two-stream network %

方法	Top-1	Top-5
RA-AGCN（joint）	94.82	99.37
RA-AGCN（bone）	93.21	99.22
RA-AGCN	95.62	99.45

Fig.12 Comparative study on effectiveness of two-stream network

Table 4 Comparison of RA-AGCN with recent methods %

方法	Top-1	方法	Top-1
Deep LSTM^[9]	67.3	DPRL^[16]	89.8
ST-LSTM^[13]	77.7	AS-GCN^[21]	94.2
TCN^[13]	83.1	2S-AGCN^[17]	95.1
ST-GCN^[15]	88.3	RA-AGCN	95.6

References 25

[1]	钱慧芳, 易剑平, 付云虎. 基于深度学习的人体动作识别综述[J]. 计算机科学与探索, 2021, 15(3):438-455.
	QIAN H F, YI J P, FU Y H. Review of human action recognition based on deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3):438-455.
[2]	孙磊. 基于时空关系分析的监控视频下行为识别技术研究[D]. 合肥: 安徽大学, 2019.
	SUN L. Research on behavior recognition technology in surveillance video based on analysis of time and space relationship[D]. Hefei: Anhui University, 2019.
[3]	李屹萌. 面向仿生机械手的表面肌电信号检测与模式识别研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.
	LI Y M. Research on surface EMG signal detection and pattern recognition for bionic manipulator[D]. Harbin: Harbin Institute of Technology, 2019.
[4]	高立青. 治安监控视频大数据中的行人行为识别方法[D]. 大连: 大连理工大学, 2017.
	GAO L Q. Pedestrian behavior identification method in public security surveillance video big data[D]. Dalian: Dalian University of Technology, 2017.
[5]	LI C, ZHONG Q Y, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detec-tion with hierarchical aggregation[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence, Sto-ckholm, Jul 13-19, 2018. New York: ACM, 2018: 786-792.
[6]	YAN Y C, XU J W, NI B B, et al. Skeleton-aided articulated motion generation[C]// Proceedings of the 25th ACM on Multimedia Conference, Mountain View, Oct 23-27, 2017. New York: ACM, 2017: 199-207.
[7]	CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Procee-dings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washing-ton: IEEE Computer Society, 2017: 1302-1310.
[8]	DU Y, WANG W, WANG L. Hierarchical recurrent neural network for skeleton based action recognition[C]// Procee-dings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1110-1118.
[9]	SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis[C]// Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Was-hington: IEEE Computer Society, 2016: 1010-1019.
[10]	祁大健, 杜慧敏, 张霞, 等. 基于上下文特征融合的行为识别算法[J]. 计算机工程与应用, 2020, 56(2):171-175.
	QI D J, DU H M, ZHANG X, et al. Behavior recognition algorithm based on context feature fusion[J]. Computer En-gineering and Applications, 2020, 56(2):171-175.
[11]	董旭, 谭励, 周丽娜, 等. 联合场景和行为特征的短视频行为识别[J]. 计算机科学与探索, 2020, 14(10):1754-1761.
	DONG X, TAN L, ZHOU L N, et al. Short video behavior recognition combining scene and behavior features[J]. Jou-rnal of Frontiers of Computer Science and Technology, 2020, 14(10):1754-1761.
[12]	SONG S J, LAN C L, XING J L, et al. An end-to-end spatio-temporal attention model for human action recognition from skeleton data[C]// Proceedings of the 31st AAAI Confe-rence on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 4263-4270.
[13]	KIM T S, REITER A. Interpretable 3D human action anal-ysis with temporal convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pat-tern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1623-1631.
[14]	LIU M Y, LIU H, CHEN C. Enhanced skeleton visuali-zation for view invariant human action recognition[J]. Pat-tern Recognition, 2017, 68:346-362.
[15]	YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recogni-tion[C]// Proceedings of the 32nd AAAI Conference on Arti-ficial Intelligence, the 30th Innovative Applications of Arti-ficial Intelligence, and the 8th AAAI Symposium on Edu-cational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 7444-7452.
[16]	TANG Y S, TIAN Y, LU J W, et al. Deep progressive rein-forcement learning for skeleton-based action recognition[C]// Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Was-hington: IEEE Computer Society, 2018: 5323-5332.
[17]	SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action re-cognition[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 12026-12035.
[18]	SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 7912-7921.
[19]	THAKKAR K C, NARAYANAN P J. Part-based graph con-volutional network for action recognition[C]// Proceedings of the British Machine Vision Conference 2018, Newcastle, Sep 3-6, 2018. London: BMVA Press, 2018: 270.
[20]	LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// Proceedings of the 2019 IEEE Conference on Com-puter Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3595-3603.
[21]	LI M S, CHEN S H, CHEN X, et al. Symbiotic graph neural networks for 3D skeleton-based human action recog-nition and motion prediction[J]. arXiv: 1910. 02212, 2019.
[22]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recog-nition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778.
[23]	WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 7794-7803.
[24]	JANG E, GU S X, POOLE B. Categorical reparameteri-zation with Gumbel-Softmax[C]// Proceedings of the 5th In-ternational Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-13.
[25]	LIU J, SHAHROUDY A, XU D, et al. Spatio-temporal LSTM with trust gates for 3D human action recognition[C]// LNCS 9907: Proceedings of the 14th European Con-ference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 816-833.