计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (4): 963-977.DOI: 10.3778/j.issn.1673-9418.2302065

• 图形·图像 • 上一篇    下一篇

预加权调制密集图卷积网络三维人体姿态估计

马金林,崔琦磊,马自萍,闫琦,曹浩杰,武江涛   

  1. 1. 北方民族大学 计算机科学与工程学院,银川 750021
    2. 图像图形智能信息处理国家民委重点实验室,银川 750021
    3. 北方民族大学 数学与信息科学学院,银川 750021
  • 出版日期:2024-04-01 发布日期:2024-04-01

Pre-weighted Modulated Dense Graph Convolutional Networks for 3D Human Pose Estimation

MA Jinlin, CUI Qilei, MA Ziping, YAN Qi, CAO Haojie, WU Jiangtao   

  1. 1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2. Key Laboratory of the National Ethnic Affairs Commission for Intelligent Processing of Image and Graphics, Yinchuan 750021, China
    3. College of Mathematics and Information Science, North Minzu University, Yinchuan 750021, China
  • Online:2024-04-01 Published:2024-04-01

摘要: 图卷积网络(GCN)日益成为三维人体姿态估计(3D HPE)的主要研究热点之一,使用GCN对人体关节点之间的关系建模的方法使三维人体姿态估计获得了良好的性能。然而,基于GCN的三维人体姿态估计方法存在过平滑和未区分关节点与相邻关节点重要性的问题。为解决这些问题,设计了调制密集连接模块(MDC)和预加权图卷积模块,并基于这两个模块提出了预加权调制密集图卷积网络的三维人体姿态估计方法(WMDGCN)。针对过平滑问题,调制密集连接通过超参数[α]和[β]更好地实现特征重用(超参数[α]表示第[l]层和之前各层总特征的权重比例,超参数[β]表示之前各层特征到第[l]层的传播策略),从而有效地提高特征的表达能力。针对未区分关节点与相邻关节点重要性的问题,使用预加权图卷积为当前关节点赋予更高的权重,并对当前关节点及其相邻关节点使用不同的权重矩阵,更有效地捕获人体关节点特征。Human3.6M数据集上的对比实验结果表明,该方法在参数量和性能上均取得了最佳性能,WMDGCN的参数量、MPJPE和P-MPJPE值分别为0.27 MB、37.46 mm和28.85 mm。

关键词: 三维人体姿态估计, 图卷积网络, 预加权

Abstract: Graph convolutional networks (GCN) have increasingly become one of the main research hotspots in 3D human pose estimation. The method of modeling the relationship between human joint points by GCN has achieved good performance in 3D human pose estimation. However, the 3D human pose estimation method based on GCN has issues of over-smooth and indistinguishable importance between joint points and adjacent joint points. To address these issues, this paper designs a modulated dense connection (MDC) module and a pre-weighted graph convolutional module, and proposes a pre-weighted modulated dense graph convolutional network (WMDGCN) for 3D human pose estimation based on these two modules. For the problem of over-smoothing, the modulation dense connection can better realize feature reuse through hyperparameter [α] and [β] (hyperparameter [α] represents the weight proportion of features of layer L to previous layers, and hyperparameter [β] represents the propagation strategies of the features of previous layers to layer L), thus effectively improving the expression ability of features. To address the issue of not distinguishing the importance of the joint points and adjacent joint points, the pre-weighted graph convolution is used to assign higher weights to the joint point. Different weight matrices are used for the joint point and its adjacent joint points to capture human joint point features more effectively. Comparative experimental results on the Human3.6M dataset show that the proposed method achieves the best performance in terms of parameter number and performance. The parameter number, MPJPE and P-MPJPE values of WMDGCN are 0.27 MB, 37.46 mm and 28.85 mm, respectively.

Key words: 3D human pose estimation, graph convolution network, pre-weighted