Survey of 3D Model Recognition Based on Deep Learning

doi:10.3778/j.issn.1673-9418.2309010

Abstract

Abstract: With the rapid advancement of three-dimensional visual perception devices such as 3D scanners and LiDAR, the field of 3D model recognition is gradually gaining the attention of a growing number of researchers. This domain encompasses two core tasks: 3D model classification and retrieval. Since deep learning has already achieved significant success in two-dimensional visual tasks, its introduction into the realm of three-dimensional visual perception not only breaks free from the constraints of traditional methods but also makes notable strides in areas such as autonomous driving and intelligent robotics. However, the application of deep learning techniques to 3D model recognition tasks still faces several challenges. In light of this, there is a need for a comprehensive review of the application of deep learning in 3D model recognition. This review begins by discussing commonly used evaluation metrics and public datasets, providing relevant information and sources for each dataset. Subsequently, it delves into representative methods from various angles, including point clouds, views, voxels, and multimodal fusion. It also summarizes recent research development in the field. Through performance comparison on these datasets, the strengths and limitations of each method are analyzed. Finally, based on the merits and demerits of these approaches, the review outlines the challenges currently faced by 3D model recognition tasks and provides an outlook on future trends in this field.

Key words: three-dimensional vision, deep learning, point clouds, views, voxels, multimodal

摘要： 随着三维扫描仪、LiDAR等三维视觉感知设备的快速发展，三维模型识别方向正逐渐引起越来越多的研究者的关注。该领域的核心任务是三维模型的分类与检索。深度学习技术在二维视觉任务方面已经取得显著的成就，将这一技术引入三维视觉领域不仅突破了传统方法的限制，还在自动驾驶、智能机器人等领域取得了引人瞩目的进展。然而，将深度学习技术应用于三维模型识别任务仍然面临着多项挑战。鉴于此，对深度学习在三维模型识别任务中的应用进行综述。首先，论述了常用的评价指标和公开数据集，介绍每个数据集的相关信息和来源。接着，从多个角度出发，包括点云、视图、体素以及多模态融合等，详细介绍现有具有代表性的方法，并梳理了近年来的相关研究工作。通过在数据集上对这些方法的性能进行对比，分析各个方法的优势和局限性。最后，基于各类方法的利弊，总结当前亟待解决的三维模型识别任务中的挑战，并展望了未来在该领域的发展趋势。

关键词: 三维视觉, 深度学习, 点云, 视图, 体素, 多模态

ZHOU Yan, LI Wenjun, DANG Zhaolong, ZENG Fanzhi, YE Dewang. Survey of 3D Model Recognition Based on Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 916-929.

周燕, 李文俊, 党兆龙, 曾凡智, 叶德旺. 深度学习的三维模型识别研究综述[J]. 计算机科学与探索, 2024, 18(4): 916-929.

References

[1] 周燕，曾凡智. 基于二维压缩感知和分层特征的图像检索算法[J]. 电子学报，2016, 44(2): 453-460.
ZHOU Y, ZENG F Z. An image retrieval algorithm based on two-dimensional compressive sensing and hierarchical feature[J]. Acta Electronica Sinica, 2016, 44(2): 453-460.
[2] HINTON G E, SALAKHUTDINOV R R. Reducing the dimen-sionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[3] KINGMA D P, WELLING M. Auto-encoding variational Bayes[J]. arXiv:1312.6114, 2013.
[4] GOODFELLOW I J, ABADIE J P, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2672-2680.
[5] SABOUR S, FROSST N, HINTON G E. Dynamic routing between capsules[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 3859-3869.
[6] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 6000-6010.
[7] 丁博，何勇军，汤磊. 基于卷积神经网络的高效三维模型检索方法[J]. 电子学报，2021, 49(1): 64-71.
DING B, HE Y J, TANG L. An efficient 3D model retrieval method based on convolutional neural network[J]. Acta Electronica Sinica, 2021, 49(1): 64-71.
[8] 张满囤，燕明晓，马英石，等. 基于八叉树结构的三维体素模型检索[J]. 计算机学报，2021, 44(2): 334-346.
ZHANG M T, YAN M X, MA Y S, et al. 3D voxel model retrieval based on octree structure[J]. Chinese Journal of Computers, 2021, 44(2): 334-346.
[9] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 77-85.
[10] HEGDE V, ZADEH R. FusionNet: 3D object classification using multiple data representations[J]. arXiv:1607.05695, 2016.
[11] 李海生，孙莉，武玉娟，等. 非刚性三维模型检索特征提取技术研究[J]. 软件学报，2018, 29(2): 483-505.
LI H S, SUN L, WU Y J, et al. Survey on feature extraction techniques for non-rigid 3D shape retrieval[J]. Journal of Software, 2018, 29(2): 483-505.
[12] 李海生，武玉娟，郑艳萍，等. 基于深度学习的三维数据分析理解方法研究综述[J]. 计算机学报，2020, 43(1): 41-63.
LI H S, WU Y J, ZHENG Y P, et al. A survey of 3D data analysis and understanding based on deep learning[J]. Chinese Journal of Computers, 2020, 43(1): 41-63.
[13] QI C R, LI Y, HAO S, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5105-5114.
[14] SHEN Y, FENG C, YANG Y, et al. Mining point cloud local structures by kernel correlation and graph pooling[J]. arXiv:1712.06760, 2017.
[15] LIU Y, FAN B, MENG G, et al. Densepoint: learning densely contextual representation for efficient point cloud processing[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 5239-5248.
[16] LAN S, YU R, YU G, et al. Modeling local geometric structure of 3D point clouds using geo-CNN[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 998-1008.
[17] WU W, QI Z, FUXIN L. PointConv: deep convolutional networks on 3D point clouds[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 9621-9630.
[18] KOMARICHEV A, ZHONG Z, HUA J. A-CNN: annularly convolutional neural networks on point clouds[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 7421-7430.
[19] WANG Y, SUN Y, LIU Z, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5 ): 1-12.
[20] XU M, ZHANG J, ZHOU Z, et al. Learning geometry-disntangled representation for complementary understanding of 3D object point cloud[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, the 33rd Conference on Innovative Applications of Artificial Intelligence, the 11th Symposium on Educational Advances in Artificial Intelligence, Feb 2-9, 2021. Menlo Park: AAAI, 2021: 3056-3064.
[21] MA X, QIN C, YOU H, et al. Rethinking network design and local geometry in point cloud: a simple residual MLP framework[J]. arXiv:2202.07123, 2022.
[22] YAN X, ZHENG C, LI Z, et al. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Washington: IEEE Computer Society, 2020: 5589-5598.
[23] GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7: 187-199.
[24] ZHAO H, JIANG L, JIA J, et al. Point transformer[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 16259-16268.
[25] ZHANG R, WANG L, WANG Y, et al. Parameter is not all you need: starting from non-parametric networks for 3D point cloud analysis[J]. arXiv:2303.08134, 2023.
[26] ZHANG J, ZHANG Z, WANG L, et al. Kernel-based feature aggregation framework in point cloud networks[J]. Pattern Recognition, 2023, 139: 109439.
[27] CHEN G, WANG M, YANG Y, et al. PointGPT: auto-regressively generative pre-training from point clouds[J]. arXiv:2305.11487, 2023.
[28] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington:IEEE Computer Society, 2015: 945-953.
[29] YU T, MENG J, YUAN J. Multi-view harmonized bilinear network for 3D object recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 186-194.
[30] HAN Z, WANG X, VONG C M, et al. 3DViewGraph: learning global features for 3D shapes from a graph of unordered views with attention[J]. arXiv:1905.07503, 2019.
[31] FENG Y, ZHANG Z, ZHAO X, et al. GVCNN: group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 264-272.
[32] JIANG J, BAO D, CHEN Z, et al. MLVCNN: multi-loop-view convolutional neural network for 3D shape retrieval[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8513-8520.
[33] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[34] JOHNS E, LEUTENEGGER S, DAVISON A. Pairwise decomposition of image sequences for active multi-view recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 3813-3822.
[35] KANEZAKI A, MATSUSHITA Y, NISHIDA Y. RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 5010-5019.
[36] WEI X, YU R, SUN J. View-GCN: view-based graph convolutional network for 3D shape analysis[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1847-1856.
[37] GOYAL A, LAW H, LIU B, et al. Revisiting point cloud shape classification with a simple and effective baseline[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 3809-3820.
[38] HAMDI A, GIANCOLA S, GHANEM B. MVTN: multi-view transformation network for 3D shape recognition[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1-11.
[39] LIU Z, ZHANG Y, GAO J, et al. VFMVAC: view-filtering-based multi-view aggregating convolution for 3D shape recognition and retrieval[J]. Pattern Recognition, 2022, 129: 108774.
[40] WEI X, YU R, SUN J. Learning view-based graph convolutional network for multi-view 3D shape analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7525-7541.
[41] XU C, LI Z, QIU Q, et al. Enhancing 2D representation via adjacent views for 3D shape retrieval[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3731-3739.
[42] SHI B, BAI S, ZHOU Z, et al. DeepPano: deep panoramic representation for 3-D shape recognition[J]. IEEE Signal Processing Letters, 2015, 22(12): 2339-2343.
[43] SFIKAS K, THEOHARIS T, PRATIKAKIS I. Exploiting the PANORAMA representation for convolutional neural network classification and retrieval[C]//Proceedings of the 10th Eurographics Workshop on 3D Object Retrieval, Lyon, Apr 23-24, 2017.
[44] BIASOTTI S, PRATIKAKIS I, CASTELLANI U, et al. SymPan: 3D model pose normalization via panoramic views and reflective symmetry[C]//Proceedings of the 6th Eurographics Workshop on 3D Object Retrieval, Girona, May 11, 2013: 41-48.
[45] MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition[C]//Procee-dings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Sep 28-Oct 2, 2015. Piscataway: IEEE, 2015: 922-928.
[46] WU Z, SONG S, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1912-1920.
[47] BROCK A, LIM T, RITCHIE J, et al. Generative and discri-minative voxel modeling with convolutional neural networks [J]. arXiv:1608.04236, 2016.
[48] LI Y, PIRK S, SU H, et al. FPNN: field probing neural networks for 3D data [J]. arXiv:1605.06240, 2016.
[49] QI C R, SU H, NIE?NER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 5648-5656.
[50] RIEGLER G, ULUSOY A O, GEIGER A. OctNet: learning deep 3D representations at high resolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 3577-3586.
[51] SHAO T, YANG Y, WENG Y, et al. H-CNN: spatial hashing based CNN for 3D shape analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 26(7): 2403-2416.
[52] LEFEBVRE S, HOPPE H. Perfect spatial hashing[J]. ACM Transactions on Graphics, 2006, 25(3): 579-588.
[53] KUMAWAT S, RAMAN S. LP-3DCNN: unveiling local phase in 3D convolutional neural networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4903-4912.
[54] CAO H, WANG J, LIU Y, et al. Fast hybrid cascade for voxel-based 3D object classification[J]. arXiv:2011.04522, 2020.
[55] YOU H, FENG Y, JI R, et al. PVNet: a joint convolutional network of point cloud and multi-view for 3D shape recognition[C]//Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Oct 22-26, 2018. New York: ACM, 2018: 1310-1318.
[56] YOU H, FENG Y, ZHAO X, et al. PVRNet: point-view relation neural network for 3D shape recognition[J]. arXiv:1812.00333, 2018.
[57] CAI W, LIU D, NING X, et al. Voxel-based three-view hybrid parallel network for 3D object classification[J]. Displays, 2021, 69: 102076.
[58] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 8748-8763.
[59] ZHANG R, GUO Z, ZHANG W, et al. PointCLIP: point cloud understanding by clip[C]//Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 19-24, 2022. Washington: IEEE Computer Society, 2022: 8552-8562.
[60] DONG R, QI Z, ZHANG L, et al. Autoencoders as cross-modal teachers: can pretrained 2D image transformers help 3D representation learning[J]. arXiv:2212.08320, 2022.
[61] XUE L, GAO M, XING C, et al. ULIP: learning a unified representation of language, images, and point clouds for 3D understanding[C]//Proceedings of the 2023 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, Vancouver, Jun 18-22, 2023. Piscataway: IEEE, 2023: 1179-1189.