计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (4): 733-742.DOI: 10.3778/j.issn.1673-9418.2008051

• 图形图像 • 上一篇    下一篇

多模态轻量级图卷积人体骨架行为识别方法

苏江毅,宋晓宁,吴小俊,於东军   

  1. 1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
    2. 南京理工大学 计算机科学与工程学院,南京 210094
  • 出版日期:2021-04-01 发布日期:2021-04-02

Skeleton Based Action Recognition Algorithm on Multi-modal Lightweight Graph Convolutional Network

SU Jiangyi, SONG Xiaoning, WU Xiaojun, YU Dongjun   

  1. 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Online:2021-04-01 Published:2021-04-02

摘要:

与传统的基于RGB视频的行为识别任务相比,基于人体骨架的行为识别方法由于其具有受光照、视角和背景复杂度等诸多因素影响非常小的特点,使其成为近几年来计算机视觉领域的主要研究方向之一。但是目前主流的基于人体骨架的行为识别方法都或多或少地存在参数量过大,运算时间过长,计算复杂度过高等问题,从而导致这些方法难以同时满足时效性和准确度这两个要求。针对上述问题,提出了一种融合多模态数据的轻量级图卷积神经网络。首先通过多模态数据融合的方法将多种信息流数据进行融合;其次通过空间流模块和时间流模块分别获得融合后数据的空间信息和时间信息;最后通过全连接层获得最终的分类结果。在行为识别数据集NTU60 RGB+D和NTU120 RGB+D上的测试结果表明该网络不仅在识别精度上优于近两年内的一些主流方法,同时在参数量的比较上也远小于其他主流方法,从而验证了该网络在兼顾时效性和计算成本的同时,准确度上的表现也十分优异。

关键词: 行为识别, 人体骨架, 轻量级, 图卷积

Abstract:

Compared with the traditional RGB-based methods, the skeleton-based action recognition methods have become the main research direction in the field of computer vision in recent years because they are less affected by many factors such as illumination, viewing angle and background complexity. However, the current skeleton-based methods still have some problems such as large parameters, long time-consuming and high computational complexity, which makes it complicated and difficult to meet the requirements of efficiency and accuracy simultaneously. To address these issues, a lightweight graph convolution network using multi-modal data fusion is proposed. Firstly, the multi-modal information flow data are fused by multi-modal data fusion method. Secondly, the spatial and temporal information of human joints are extracted using spatial and temporal modules respectively. Finally, the classification results are obtained through the fully connected layer. Experimental results conducted on the two commonly used datasets including NTU60 RGB+D and NTU120 RGB+D demonstrate that the proposed network outperforms some mainstream methods in the last two years in both recognition accuracy and efficiency, thus verifying that the network has excellent performance in terms of accuracy, while considering time efficiency and computational cost.

Key words: action recognition, human skeleton, lightweight, graph convolutional network