多特征像素级融合的遮挡物体6DoF姿态估计研究

doi:10.3778/j.issn.1673-9418.2003041

摘要/Abstract

摘要：

为了解决目前机器人在物体被遮挡以及光照不足的环境下难以实现精准6DoF姿态估计的问题，提出了一个基于像素级特征融合的神经网络框架。该框架包含三个模块，分别为RGB特征提取网络模块、像素融合结构模块以及6D姿态回归网络模块。其中RGB特征提取网络主要用于分割目标物体并进行特征的提取；像素融合结构负责将RGB特征和三维多视角特征进行融合；最后一个模块将三维点云像素进行融合，并输出物体6D姿态结构。通过在YCB-Video数据集、LINEMOD数据集以及处理后的YCB-Occlusion数据集上的实验证明，所提出的像素级融合网络能在物体被遮挡以及物体点云数据丢失等情况下有效预测出物体的6D姿态，并且其计算效率在损失少量精确度的情况下比其他网络提高了上百倍，且具有较强的鲁棒性。

关键词: 像素级融合, 卷积神经网络, 点云特征融合, 6DoF姿态估计

Abstract:

In order to solve the problem that current robots are difficult to achieve accurate 6DoF pose estimation under the environment of occluded objects and insufficient lighting, in this paper, a pixel-level based neural network framework is proposed, which includes three modules, the RGB feature extraction networks module, the pixel-level fusion module and the 6D pose regression network module. The RGB feature extraction networks module firstly segments the target objects and then extracts the objects' features. The pixel-level fusion module is applied for fusing RGB features with 3D multi-view features. And the last module fuses 3D point cloud pixels and outputs the 6D pose of the objects. The experiments conducted on the YCB-Video dataset, the LINEMOD dataset, and the YCB-Occlusion dataset processed in this paper manifest that the framework proposed can effectively predict the 6D pose of the objects even when the objects are occluded or the point clouds of the object are lost. Furthermore, compared with other frameworks, this framework is more robust and the efficiency is improved by hundreds of times only with a small loss of accuracy.

Key words: pixel-level fusion, convolutional neural networks (CNN), point cloud feature fusion, 6DoF pose estimation

梁达勇，陈俊洪，朱展模，黄可思，刘文印. 多特征像素级融合的遮挡物体6DoF姿态估计研究[J]. 计算机科学与探索, 2020, 14(12): 2072-2082.

LIANG Dayong, CHEN Junhong, ZHU Zhanmo, HUANG Kesi, LIU Wenyin. Research on Occluded Objects 6DoF Pose Estimation with Multi-feature and Pixel- level Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(12): 2072-2082.

参考文献

[1] Xiang Y, Schmidt T, Narayanan V, et al. PoseCNN: a con-volutional neural network for 6D object pose estimation in cluttered scenes[J]. arXiv:1711.00199, 2017.
[2] Li Y, Wang G, Ji X, et al. DeepIM: deep iterative matching for 6d pose estimation[C]//Proceedings of the 2018 Euro-pean Conference on Computer Vision, Munich, Sep 8-14, 2018. Berlin: Springer, 2018: 683-698.
[3] Jafari O H, Mustikovela S K, Pertsch K, et al. iPose: instance-aware 6D pose estimation of partly occluded objects[C]// Proceedings of the 14th Asian Conference on Computer Vi-sion, Perth, Dec 4-6, 2018. Berlin: Springer, 2018: 477-492.
[4] Wang C, Xu D, Zhu Y, et al. DenseFusion: 6D object pose estimation by iterative dense fusion[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Re-cognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3343-3352.
[5] Qi C R, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pat-tern Recognition, Honolulu, Jul 22-25, 2017. Piscataway: IEEE, 2017: 652-660.
[6] Xu D, Anguelov D, Jain A. PointFusion: deep sensor fusion for 3D bounding box estimation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 19-21, 2018. Piscataway: IEEE, 2018: 244-253.
[7] Do T T, Cai M, Pham T, et al. Deep-6DPose: recovering 6D object pose from a single rgb image[J]. arXiv:1802.10367, 2018.
[8] Kehl W, Manhardt F, Tombari F, et al. SSD-6D: making RGB-based 3D detection and 6D pose estimation great again[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision, Honolulu, Jul 22-25, 2017. Piscataway: IEEE, 2017: 1521-1529.
[9] Peng S, Liu Y, Huang Q, et al. PVNet: pixel-wise voting net-work for 6DoF pose estimation[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 4561-4570.
[10] Hu Y, Hugonot J, Fua P, et al. Segmentation-driven 6D object pose estimation[C]//Proceedings of the 2019 IEEE Confer-ence on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 3385-3394.
[11] Brachmann E, Krull A, Michel F, et al. Learning 6D object pose estimation using 3D object coordinates[C]//Proceed-ings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Berlin: Springer, 2014: 536-551.
[12] Michel F, Kirillov A, Brachmann E, et al. Global hypothesis generation for 6D object pose estimation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pat-tern Recognition, Honolulu, Jul 22-25, 2017. Piscataway: IEEE, 2017: 462-471.
[13] Kehl W, Milletari F, Tombari F, et al. Deep learning of local RGB-D patches for 3D object detection and 6D pose estim-ation[C]//Proceedings of the 14th?European Conference on Computer Vision, Amsterdam, Oct 8-16, 2016. Berlin: Spr-inger, 2016: 205-220.
[14] Doumanoglou A, Kouskouridas R, Malassiotis S, et al. Rec-overing 6D object pose and predicting next-best-view in the crowd[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 3583-3592.
[15] Shi S, Wang X, Li H. PointRCNN: 3D object proposal gen-eration and detection from point cloud[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 770-779.
[16] Qi C R, Liu W, Wu C, et al. Frustum PointNets for 3D object detection from RGB-D data[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 19-21, 2018. Piscataway: IEEE, 2018: 918-927.
[17] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Berlin: Springer, 2014: 740-755.
[18] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Pro-ceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Piscataway: IEEE, 2017: 2961-2969.
[19] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition, Las Vegas, Jun 26-Jul 1, 2016. Piscataway: IEEE, 2016: 770-778.
[20] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, Jul 22-25, 2017. Piscataway: IEEE, 2017: 2881-2890.
[21] Billings G, Johnson-Roberson M. SilhoNet: an RGB meth-od for 6D object pose estimation[J]. IEEE Robotics and Au-tomation Letters, 2019, 4(4): 3727-3734.
[22] Carlson A, Skinner K A, Johnson-Roberson M. Modeling camera effects to improve deep vision for real and synthetic data[J]. arXiv:1803.07721, 2018.
[23] Calli B, Singh A, Walsman A, et al. The YCB object and model set: towards common benchmarks for manipulation research[C]//Proceedings of the 2015 International Confer-ence on Advanced Robotics, Istanbul, Jul 27-31, 2015. Pis-cataway: IEEE, 2015: 510-517.
[24] Hinterstoisser S, Lepetit V, Ilic S, et al. Model based train-ing, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]//Proceedings of the 11th Asian Conference on Computer Vision, Daejeon, Nov 5-9, 2012. Berlin: Springer, 2012: 548-562.
[25] Sundermeyer M, Marton Z C, Durner M, et al. Implicit 3D orientation learning for 6D object detection from RGB im-ages[C]//Proceedings of the 15th?European Conference on Computer Vision, Munich, Sep 8-14, 2018. Berlin: Springer, 2018: 699-715.