计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (4): 916-929.DOI: 10.3778/j.issn.1673-9418.2309010
周燕,李文俊,党兆龙,曾凡智,叶德旺
出版日期:
2024-04-01
发布日期:
2024-04-01
ZHOU Yan, LI Wenjun, DANG Zhaolong, ZENG Fanzhi, YE Dewang
Online:
2024-04-01
Published:
2024-04-01
摘要: 随着三维扫描仪、LiDAR等三维视觉感知设备的快速发展,三维模型识别方向正逐渐引起越来越多的研究者的关注。该领域的核心任务是三维模型的分类与检索。深度学习技术在二维视觉任务方面已经取得显著的成就,将这一技术引入三维视觉领域不仅突破了传统方法的限制,还在自动驾驶、智能机器人等领域取得了引人瞩目的进展。然而,将深度学习技术应用于三维模型识别任务仍然面临着多项挑战。鉴于此,对深度学习在三维模型识别任务中的应用进行综述。首先,论述了常用的评价指标和公开数据集,介绍每个数据集的相关信息和来源。接着,从多个角度出发,包括点云、视图、体素以及多模态融合等,详细介绍现有具有代表性的方法,并梳理了近年来的相关研究工作。通过在数据集上对这些方法的性能进行对比,分析各个方法的优势和局限性。最后,基于各类方法的利弊,总结当前亟待解决的三维模型识别任务中的挑战,并展望了未来在该领域的发展趋势。
周燕, 李文俊, 党兆龙, 曾凡智, 叶德旺. 深度学习的三维模型识别研究综述[J]. 计算机科学与探索, 2024, 18(4): 916-929.
ZHOU Yan, LI Wenjun, DANG Zhaolong, ZENG Fanzhi, YE Dewang. Survey of 3D Model Recognition Based on Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 916-929.
[1] 周燕,曾凡智. 基于二维压缩感知和分层特征的图像检索算法[J]. 电子学报,2016, 44(2): 453-460. ZHOU Y, ZENG F Z. An image retrieval algorithm based on two-dimensional compressive sensing and hierarchical feature[J]. Acta Electronica Sinica, 2016, 44(2): 453-460. [2] HINTON G E, SALAKHUTDINOV R R. Reducing the dimen-sionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. [3] KINGMA D P, WELLING M. Auto-encoding variational Bayes[J]. arXiv:1312.6114, 2013. [4] GOODFELLOW I J, ABADIE J P, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2672-2680. [5] SABOUR S, FROSST N, HINTON G E. Dynamic routing between capsules[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 3859-3869. [6] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 6000-6010. [7] 丁博,何勇军,汤磊. 基于卷积神经网络的高效三维模型检索方法[J]. 电子学报,2021, 49(1): 64-71. DING B, HE Y J, TANG L. An efficient 3D model retrieval method based on convolutional neural network[J]. Acta Electronica Sinica, 2021, 49(1): 64-71. [8] 张满囤,燕明晓,马英石,等. 基于八叉树结构的三维体素模型检索[J]. 计算机学报,2021, 44(2): 334-346. ZHANG M T, YAN M X, MA Y S, et al. 3D voxel model retrieval based on octree structure[J]. Chinese Journal of Computers, 2021, 44(2): 334-346. [9] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 77-85. [10] HEGDE V, ZADEH R. FusionNet: 3D object classification using multiple data representations[J]. arXiv:1607.05695, 2016. [11] 李海生,孙莉,武玉娟,等. 非刚性三维模型检索特征提取技术研究[J]. 软件学报,2018, 29(2): 483-505. LI H S, SUN L, WU Y J, et al. Survey on feature extraction techniques for non-rigid 3D shape retrieval[J]. Journal of Software, 2018, 29(2): 483-505. [12] 李海生,武玉娟,郑艳萍,等. 基于深度学习的三维数据分析理解方法研究综述[J]. 计算机学报,2020, 43(1): 41-63. LI H S, WU Y J, ZHENG Y P, et al. A survey of 3D data analysis and understanding based on deep learning[J]. Chinese Journal of Computers, 2020, 43(1): 41-63. [13] QI C R, LI Y, HAO S, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5105-5114. [14] SHEN Y, FENG C, YANG Y, et al. Mining point cloud local structures by kernel correlation and graph pooling[J]. arXiv:1712.06760, 2017. [15] LIU Y, FAN B, MENG G, et al. Densepoint: learning densely contextual representation for efficient point cloud processing[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 5239-5248. [16] LAN S, YU R, YU G, et al. Modeling local geometric structure of 3D point clouds using geo-CNN[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 998-1008. [17] WU W, QI Z, FUXIN L. PointConv: deep convolutional networks on 3D point clouds[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 9621-9630. [18] KOMARICHEV A, ZHONG Z, HUA J. A-CNN: annularly convolutional neural networks on point clouds[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 7421-7430. [19] WANG Y, SUN Y, LIU Z, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5 ): 1-12. [20] XU M, ZHANG J, ZHOU Z, et al. Learning geometry-disntangled representation for complementary understanding of 3D object point cloud[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, the 33rd Conference on Innovative Applications of Artificial Intelligence, the 11th Symposium on Educational Advances in Artificial Intelligence, Feb 2-9, 2021. Menlo Park: AAAI, 2021: 3056-3064. [21] MA X, QIN C, YOU H, et al. Rethinking network design and local geometry in point cloud: a simple residual MLP framework[J]. arXiv:2202.07123, 2022. [22] YAN X, ZHENG C, LI Z, et al. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Washington: IEEE Computer Society, 2020: 5589-5598. [23] GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7: 187-199. [24] ZHAO H, JIANG L, JIA J, et al. Point transformer[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 16259-16268. [25] ZHANG R, WANG L, WANG Y, et al. Parameter is not all you need: starting from non-parametric networks for 3D point cloud analysis[J]. arXiv:2303.08134, 2023. [26] ZHANG J, ZHANG Z, WANG L, et al. Kernel-based feature aggregation framework in point cloud networks[J]. Pattern Recognition, 2023, 139: 109439. [27] CHEN G, WANG M, YANG Y, et al. PointGPT: auto-regressively generative pre-training from point clouds[J]. arXiv:2305.11487, 2023. [28] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington:IEEE Computer Society, 2015: 945-953. [29] YU T, MENG J, YUAN J. Multi-view harmonized bilinear network for 3D object recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 186-194. [30] HAN Z, WANG X, VONG C M, et al. 3DViewGraph: learning global features for 3D shapes from a graph of unordered views with attention[J]. arXiv:1905.07503, 2019. [31] FENG Y, ZHANG Z, ZHAO X, et al. GVCNN: group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 264-272. [32] JIANG J, BAO D, CHEN Z, et al. MLVCNN: multi-loop-view convolutional neural network for 3D shape retrieval[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8513-8520. [33] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [34] JOHNS E, LEUTENEGGER S, DAVISON A. Pairwise decomposition of image sequences for active multi-view recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 3813-3822. [35] KANEZAKI A, MATSUSHITA Y, NISHIDA Y. RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 5010-5019. [36] WEI X, YU R, SUN J. View-GCN: view-based graph convolutional network for 3D shape analysis[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1847-1856. [37] GOYAL A, LAW H, LIU B, et al. Revisiting point cloud shape classification with a simple and effective baseline[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 3809-3820. [38] HAMDI A, GIANCOLA S, GHANEM B. MVTN: multi-view transformation network for 3D shape recognition[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1-11. [39] LIU Z, ZHANG Y, GAO J, et al. VFMVAC: view-filtering-based multi-view aggregating convolution for 3D shape recognition and retrieval[J]. Pattern Recognition, 2022, 129: 108774. [40] WEI X, YU R, SUN J. Learning view-based graph convolutional network for multi-view 3D shape analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7525-7541. [41] XU C, LI Z, QIU Q, et al. Enhancing 2D representation via adjacent views for 3D shape retrieval[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3731-3739. [42] SHI B, BAI S, ZHOU Z, et al. DeepPano: deep panoramic representation for 3-D shape recognition[J]. IEEE Signal Processing Letters, 2015, 22(12): 2339-2343. [43] SFIKAS K, THEOHARIS T, PRATIKAKIS I. Exploiting the PANORAMA representation for convolutional neural network classification and retrieval[C]//Proceedings of the 10th Eurographics Workshop on 3D Object Retrieval, Lyon, Apr 23-24, 2017. [44] BIASOTTI S, PRATIKAKIS I, CASTELLANI U, et al. SymPan: 3D model pose normalization via panoramic views and reflective symmetry[C]//Proceedings of the 6th Eurographics Workshop on 3D Object Retrieval, Girona, May 11, 2013: 41-48. [45] MATURANA D, SCHERER S. VoxNet: a 3D convolutional neural network for real-time object recognition[C]//Procee-dings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Sep 28-Oct 2, 2015. Piscataway: IEEE, 2015: 922-928. [46] WU Z, SONG S, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1912-1920. [47] BROCK A, LIM T, RITCHIE J, et al. Generative and discri-minative voxel modeling with convolutional neural networks [J]. arXiv:1608.04236, 2016. [48] LI Y, PIRK S, SU H, et al. FPNN: field probing neural networks for 3D data [J]. arXiv:1605.06240, 2016. [49] QI C R, SU H, NIE?NER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 5648-5656. [50] RIEGLER G, ULUSOY A O, GEIGER A. OctNet: learning deep 3D representations at high resolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 3577-3586. [51] SHAO T, YANG Y, WENG Y, et al. H-CNN: spatial hashing based CNN for 3D shape analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 26(7): 2403-2416. [52] LEFEBVRE S, HOPPE H. Perfect spatial hashing[J]. ACM Transactions on Graphics, 2006, 25(3): 579-588. [53] KUMAWAT S, RAMAN S. LP-3DCNN: unveiling local phase in 3D convolutional neural networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4903-4912. [54] CAO H, WANG J, LIU Y, et al. Fast hybrid cascade for voxel-based 3D object classification[J]. arXiv:2011.04522, 2020. [55] YOU H, FENG Y, JI R, et al. PVNet: a joint convolutional network of point cloud and multi-view for 3D shape recognition[C]//Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Oct 22-26, 2018. New York: ACM, 2018: 1310-1318. [56] YOU H, FENG Y, ZHAO X, et al. PVRNet: point-view relation neural network for 3D shape recognition[J]. arXiv:1812.00333, 2018. [57] CAI W, LIU D, NING X, et al. Voxel-based three-view hybrid parallel network for 3D object classification[J]. Displays, 2021, 69: 102076. [58] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 8748-8763. [59] ZHANG R, GUO Z, ZHANG W, et al. PointCLIP: point cloud understanding by clip[C]//Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 19-24, 2022. Washington: IEEE Computer Society, 2022: 8552-8562. [60] DONG R, QI Z, ZHANG L, et al. Autoencoders as cross-modal teachers: can pretrained 2D image transformers help 3D representation learning[J]. arXiv:2212.08320, 2022. [61] XUE L, GAO M, XING C, et al. ULIP: learning a unified representation of language, images, and point clouds for 3D understanding[C]//Proceedings of the 2023 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, Vancouver, Jun 18-22, 2023. Piscataway: IEEE, 2023: 1179-1189. |
[1] | 王香, 毛力, 陈祺东, 孙俊. 融合动态梯度和多视图协同注意力的情感分析[J]. 计算机科学与探索, 2024, 18(5): 1328-1338. |
[2] | 于范, 张菁. 滑窗注意力多尺度均衡的密集行人检测算法[J]. 计算机科学与探索, 2024, 18(5): 1286-1300. |
[3] | 曾凡智, 冯文婕, 周燕. 深度学习的自然场景文本识别方法综述[J]. 计算机科学与探索, 2024, 18(5): 1160-1181. |
[4] | 张凯丽, 王安志, 熊娅维, 刘运. 基于Transformer的单幅图像去雾算法综述[J]. 计算机科学与探索, 2024, 18(5): 1182-1196. |
[5] | 杨力, 钟俊弘, 张赟, 宋欣渝. 基于复合跨模态交互网络的时序多模态情感分析[J]. 计算机科学与探索, 2024, 18(5): 1318-1327. |
[6] | 蓝鑫, 吴淞, 伏博毅, 秦小林. 深度学习的遥感图像旋转目标检测综述[J]. 计算机科学与探索, 2024, 18(4): 861-877. |
[7] | 孙水发, 汤永恒, 王奔, 董方敏, 李小龙, 蔡嘉诚, 吴义熔. 动态场景的三维重建研究综述[J]. 计算机科学与探索, 2024, 18(4): 831-860. |
[8] | 王恩龙, 李嘉伟, 雷佳, 周士华. 基于深度学习的红外可见光图像融合综述[J]. 计算机科学与探索, 2024, 18(4): 899-915. |
[9] | 曹传博, 郭春, 李显超, 申国伟. 基于AECD词嵌入的挖矿恶意软件早期检测方法[J]. 计算机科学与探索, 2024, 18(4): 1083-1093. |
[10] | 章淯淞, 夏鸿斌, 刘渊. 自监督混合图神经网络的会话推荐模型[J]. 计算机科学与探索, 2024, 18(4): 1021-1031. |
[11] | 薛金强, 吴秦. 面向图像复原和增强的轻量级交叉门控Transformer[J]. 计算机科学与探索, 2024, 18(3): 718-730. |
[12] | 杨超城, 严宣辉, 陈容均, 李汉章. 融合双重注意力机制的时间序列异常检测模型[J]. 计算机科学与探索, 2024, 18(3): 740-754. |
[13] | 申通, 王硕, 李孟, 秦伦明. 深度学习在动物行为分析中的应用研究进展[J]. 计算机科学与探索, 2024, 18(3): 612-626. |
[14] | 王一凡, 刘静, 马金刚, 邵润华, 陈天真, 李明. 深度学习在乳腺癌影像学检查中的应用进展[J]. 计算机科学与探索, 2024, 18(2): 301-319. |
[15] | 彭斌, 白静, 李文静, 郑虎, 马向宇. 面向图像分类的视觉Transformer研究进展[J]. 计算机科学与探索, 2024, 18(2): 320-344. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||