计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (12): 2072-2082.DOI: 10.3778/j.issn.1673-9418.2003041

• 人工智能 • 上一篇    下一篇

多特征像素级融合的遮挡物体6DoF姿态估计研究

梁达勇,陈俊洪,朱展模,黄可思,刘文印   

  1. 广东工业大学 计算机学院,广州 510006
  • 出版日期:2020-12-01 发布日期:2020-12-11

Research on Occluded Objects 6DoF Pose Estimation with Multi-feature and Pixel- level Fusion

LIANG Dayong, CHEN Junhong, ZHU Zhanmo, HUANG Kesi, LIU Wenyin   

  1. School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China
  • Online:2020-12-01 Published:2020-12-11

摘要:

为了解决目前机器人在物体被遮挡以及光照不足的环境下难以实现精准6DoF姿态估计的问题,提出了一个基于像素级特征融合的神经网络框架。该框架包含三个模块,分别为RGB特征提取网络模块、像素融合结构模块以及6D姿态回归网络模块。其中RGB特征提取网络主要用于分割目标物体并进行特征的提取;像素融合结构负责将RGB特征和三维多视角特征进行融合;最后一个模块将三维点云像素进行融合,并输出物体6D姿态结构。通过在YCB-Video数据集、LINEMOD数据集以及处理后的YCB-Occlusion数据集上的实验证明,所提出的像素级融合网络能在物体被遮挡以及物体点云数据丢失等情况下有效预测出物体的6D姿态,并且其计算效率在损失少量精确度的情况下比其他网络提高了上百倍,且具有较强的鲁棒性。

关键词: 像素级融合, 卷积神经网络, 点云特征融合, 6DoF姿态估计

Abstract:

In order to solve the problem that current robots are difficult to achieve accurate 6DoF pose estimation under the environment of occluded objects and insufficient lighting, in this paper, a pixel-level based neural network framework is proposed, which includes three modules, the RGB feature extraction networks module, the pixel-level fusion module and the 6D pose regression network module. The RGB feature extraction networks module firstly segments the target objects and then extracts the objects' features. The pixel-level fusion module is applied for fusing RGB features with 3D multi-view features. And the last module fuses 3D point cloud pixels and outputs the 6D pose of the objects. The experiments conducted on the YCB-Video dataset, the LINEMOD dataset, and the YCB-Occlusion dataset processed in this paper manifest that the framework proposed can effectively predict the 6D pose of the objects even when the objects are occluded or the point clouds of the object are lost. Furthermore, compared with other frameworks, this framework is more robust and the efficiency is improved by hundreds of times only with a small loss of accuracy.

Key words: pixel-level fusion, convolutional neural networks (CNN), point cloud feature fusion, 6DoF pose estimation