计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (9): 2384-2394.DOI: 10.3778/j.issn.1673-9418.2307016

• 理论·算法 • 上一篇    下一篇

空间注意力与位置优化的三维人体姿态估计域适应算法

姜友鹏,华阳,宋晓宁   

  1. 江南大学 人工智能与计算机学院 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
  • 出版日期:2024-09-01 发布日期:2024-09-01

Domain Adaptation Algorithm for 3D Human Pose Estimation with Spatial Attention and Position Optimization

JIANG Youpeng, HUA Yang, SONG Xiaoning   

  1. Jiangsu Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2024-09-01 Published:2024-09-01

摘要: 现有三维人体姿态估计器在单个数据集上表现较好,但受限于训练数据姿态结构的单一,其在跨域实验上的泛化性不足。现有方法通过增加姿态多样性来弥补该缺陷,然而这些方法生成的新姿态缺乏真实有效性且姿态全局位置的分布与目标数据集仍存在显著差距。针对上述问题,提出一种基于生成对抗网络(GAN)的空间注意力与全局位置优化的三维人体姿态估计域适应算法。算法引入空间节点注意力模块约束生成器产生更自然的人体姿态,并结合姿态位置修正模块促使生成姿态向目标数据域对齐,从而解决以上域适应问题。此外,为了提升估计器训练的稳定性,提出一种端到端随机混合的训练策略,使姿态估计器可兼顾新旧数据信息的学习。作为一种生成式的域适应方法,该算法可以高效地应用于各种二阶段三维人体姿态估计器。通过跨场景实验与跨数据集实验,结果表明所提算法在多个基准数据集上的表现均达到当前最佳。其中在3DHP数据集中,该方法MPJPE与AUC指标相比最优工作优化了1.7%和1.4%,验证了所提算法可有效提高三维人体姿态估计器的泛化性。

关键词: 三维人体姿态估计, 无监督域适应, 生成对抗网络(GAN), 注意力机制

Abstract: Existing 3D human pose estimators perform well on a single dataset but are limited by the single pose structure of the training data, resulting in insufficient generalization to cross-domain experiments. Existing methods mitigate this deficiency by increasing pose diversity, but their generated poses often lack validity. Moreover, there is still a significant gap between the global positions of poses in the target and source domains. To address these issues, a spatial attention and global position optimization domain adaptation algorithm for 3D human pose estimation based on generative adversarial network (GAN) is proposed. The algorithm introduces a spatial node attention module to constrain the generator to produce more natural human poses, and combines it with a pose position correction module to drive the generated poses to align to the target data domain, thus solving the above domain adaptation problem. In addition, in order to improve the stability of estimator training, an end-to-end stochastic hybrid training strategy is proposed so that the pose estimator can take into account the learning of new and old data information. As a generative domain adaptation method, this algorithm can be efficiently applied to various two-stage 3D human pose estimators. Through cross-scene experiments and cross-dataset experiments, the results show that the proposed algorithm achieves the current best performance on several benchmark datasets. Among them, in the 3DHP dataset, the MPJPE and AUC metrics of the proposed method are optimized by 1.7% and 1.4% compared with the optimal work, which verifies that the proposed algorithm can effectively improve the generalization of 3D human pose estimators.

Key words: 3D human pose estimation, unsupervised domain adaptation, generative adversarial network (GAN), attention mechanism