计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (3): 549-560.DOI: 10.3778/j.issn.1673-9418.2209014
王文森,黄凤荣,王旭,刘庆璘,羿博珩
出版日期:
2023-03-01
发布日期:
2023-03-01
WANG Wensen, HUANG Fengrong, WANG Xu, LIU Qinglin, YI Boheng
Online:
2023-03-01
Published:
2023-03-01
摘要: 视觉惯性里程计在很多方面可以很好地实现视觉和惯性传感器的优势互补,获得高精度的6自由度导航定位,因此应用领域极为广泛。然而,传感器自身的误差、异常视觉环境的扰动、多传感器之间的时空校准误差都会干扰导航结果,导致导航精度下降。近年来,正在迅速发展的深度学习方法凭借其强大的数据处理和预测能力,给视觉惯性里程计的发展提供了全新的发展方向。对基于深度学习的视觉惯性里程计的主要发展成果进行了回顾与总结。首先,按照两种融合策略分别概述研究方法,包括深度学习与传统模型结合的方法和基于深度学习的端到端的方法。之后,根据深度学习类型分为监督学习和无监督/自监督学习的方法,并分别阐述了这些方法的模型结构。然后,概述了系统的优化与评估方法,并比较了其中一些具有代表性的方法的性能。最后,对该领域需要解决的关键难点问题进行了总结,对未来发展进行了展望。
王文森, 黄凤荣, 王旭, 刘庆璘, 羿博珩. 基于深度学习的视觉惯性里程计技术综述[J]. 计算机科学与探索, 2023, 17(3): 549-560.
WANG Wensen, HUANG Fengrong, WANG Xu, LIU Qinglin, YI Boheng. Overview of Visual Inertial Odometry Technology Based on Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 549-560.
[1] GEMEINER P, EINRAMHOF P, VINCZE M. Simultaneous motion and structure estimation by fusion of inertial and vision data[J]. International Journal of Robotics Research,2007, 26(6): 591-605. [2] SERVANT F, HOULIER P, MARCHAND é. Improving monocular plane-based SLAM with inertial measures[C]//Proceedings of the 2010 IEEE/RSJ International Confe-rence on Intelligent Robots and Systems, Taipei, China, Oct 18-22, 2010. Piscataway: IEEE, 2010: 3810-3815. [3] HUANG G. Visual-inertial navigation: a concise review[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 2575-7075. [4] DELMERICO J, SCARAMUZZA D. A benchmark compa-rison of monocular visual-inertial odometry algorithms for flying robots[C]//Proceedings of the 2018 IEEE Interna-tional Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 2502-2509. [5] SERVIèRES M, RENAUDIN V, DUPUIS A, et al. Visual and visual-inertial slam: state of the art, classification, and experimental benchmarking[J]. Journal of Sensors, 2021(1):1-26. [6] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [7] FARSAL W, ANTER S, RAMDANI M. Deep learning: an overview[C]//Proceedings of the 12th International Confe-rence on Intelligent Systems: Theories and Applications, Rabat, Nov 24-26, 2017. New York: ACM, 2018: 1-6. [8] FORSTER C, CARLONE L, DELLAERT F, et al. IMU preintegration on manifold for efficient visual-inertial maxi-mum-a-posteriori estimation[C]//Proceedings of the Robotics: Science and Systems, Rome, Jul 13-17, 2015. Cambridge: MIT Press, 2015. [9] YANG Y, HUANG G. Aided inertial navigation with geo-metric features: observability analysis[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 2334-2340. [10] CHEN C, ROSA S, LU C X, et al. SelectFusion: a generic framework to selectively learn multisensory fusion[J]. arXiv:1912.13077, 2019. [11] 任泽裕, 王振超, 柯尊旺, 等. 多模态数据融合综述[J]. 计算机工程与应用, 2021, 57(18): 49-64. REN Z Y, WANG Z C, KE Z W, et al. A survey of multi-modal data fusion[J]. Computer Engineering and Applica-tions, 2021, 57(18): 49-64. [12] RAMBACH J, TEWARI A, PAGANI A, et al. Learning to fuse: a deep learning approach to visual-inertial camera pose estimation[C]//Proceedings of the 2016 IEEE International Symposium on Mixed and Augmented Reality, Merida, Sep 19-23, 2016. Piscataway: IEEE, 2016: 71-76. [13] LI C, WASLANDER S L. Towards end-to-end learning of visual inertial odometry with an EKF[C]//Proceedings of the 17th Conference on Computer and Robot Vision, Ottawa, May 13-15, 2020. Piscataway: IEEE, 2020: 190-197. [14] WANG S, CLARK R, WEN H, et al. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]//Proceedings of the 2017 IEEE Interna-tional Conference on Robotics and Automation, Singapore, May 29-Jun 3, 2017. Piscataway: IEEE, 2017: 2043-2050. [15] 余洪山, 郭丰, 郭林峰, 等. 融合改进SuperPoint网络的鲁棒单目视觉惯性SLAM[J]. 仪器仪表学报, 2021, 42(1): 116-126. YU H S, GUO F, GUO L F, et al. Robust monocular visual-inertial SLAM based on the improved SuperPoint network[J]. Chinese Journal of Scientific Instrument, 2021, 42(1): 116-126. [16] DETONE D, MALISIEWICZ T, RABINOVICH A. Super-point: self-supervised interest point detection and descri-ption[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 2160-7508. [17] QIN T, LI P, SHEN S. VINS-Mono: a robust and versatile monocular visual-inertial state estimator[J]. IEEE Transactions on Robotics, 2018, 34(4): 1004-1020. [18] CHEN D, WANG N, XU R, et al. RNIN-VIO: robust neural inertial navigation aided visual-inertial odometry in challen-ging scenes[C]//Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality, Bari, Oct 4-8, 2021. Piscataway: IEEE, 2021: 275-283. [19] WANG T, CHANG Z, ZHANG W, et al. Research on the integrated positioning method of inertial/visual aided by convolutional neural network[C]//Proceedings of the 2021 International Conference on Control, Automation and Infor-mation Sciences, Xi??an, Oct 14-17, 2021. Piscataway: IEEE, 2021: 2475-7896. [20] SHAN M, FENG Q, ATANASOV N. OrcVIO: object residual constrained visual-inertial odometry[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Oct 24, 2020-Jan 24, 2021. Pisca-taway: IEEE, 2020: 5104-5111. [21] ZUO X, MERRILL N, LI W, et al. CodeVIO: visual-inertial odometry with learned optimizable dense depth[C]//Procee-dings of the 2021 IEEE International Conference on Robotics and Automation, Xi??an, May 30-Jun 5, 2021. Piscataway: IEEE, 2021: 14382-14388. [22] MOURIKIS A I, ROUMELIOTIS S I. A multi-state cons-traint Kalman filter for vision-aided inertial navigation[C]//Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Rome, Apr 10-14, 2007. Piscataway: IEEE, 2007: 3565-3572. [23] CLARK R, WANG S, WEN H, et al. VINet: visual-inertial odometry as a sequence-to-sequence learning problem[C]//Proceedings of the 31st AAAI Conference on Artificial Inte-lligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 3995-4001. [24] CHEN C, ROSA S, MIAO Y, et al. Selective sensor fusion for neural visual-inertial odometry[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 10542-10551. [25] ALMALIOGLU Y, TURAN M, SAPUTRA M R U, et al. SelfVIO: self-supervised deep monocular visual-inertial odometry and depth estimation[J]. Neural Networks, 2022, 150(6): 119-136. [26] SHINDE K, LEE J, HUMT M, et al. Learning multiplica-tive interactions with Bayesian neural networks for visual-inertial odometry[J]. arXiv:2007.07630, 2020. [27] LIU L, LI G, LI T H. ATVIO: attention guided visual-inertial odometry[C]//Proceedings of the 2021 IEEE Inter-national Conference on Acoustics, Speech and Signal Proce-ssing, Toronto, Jun 6-11, 2021. Piscataway: IEEE, 2021: 4125-4129. [28] ASLAN M F, DURDU A, SABANCI K. Visual-inertial image-odometry (VIIONet): a Gaussian process regression-based deep architecture proposal for UAV pose estimation[J]. Measurement, 2022, 194: 111030. [29] SEEGER M. Gaussian processes for machine learning[J]. International Journal of Neural Systems, 2004, 14(2): 69-106. [30] TIAN Y, COMPERE M. A case study on visual-inertial odometry using supervised, semi-supervised and unsuper-vised learning methods[C]//Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality, San Diego, Dec 9-11, 2019. Piscataway: IEEE, 2019: 203-207. [31] SHAMWELL E J, LEUNG S, NOTHWANG W D. Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction[C]//Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Oct 1-5, 2018. Piscataway: IEEE, 2018: 2524-2531. [32] SHAMWELL E J, LINDGREN K, LEUNG S, et al. Unsu-pervised deep visual-inertial odometry with online error correction for RGB-D imagery[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2478-2493. [33] LINDGREN K, LEUNG S, NOTHWANG W D, et al. Boom-VIO: bootstrapped monocular visual-inertial odometry with absolute trajectory estimation through unsupervised deep learning[C]//Proceedings of the 19th International Confe-rence on Advanced Robotics, Belo Horizonte, Dec 2-6, 2019. Piscataway: IEEE, 2019: 516-522. [34] HAN L, LIN Y, DU G, et al. DeepVIO: self-supervised deep learning of monocular visual inertial odometry using 3D geometric constraints[C]//Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, China, Nov 3-8, 2019. Piscataway: IEEE, 2019: 6906-6913. [35] WEI P, HUA G, HUANG W, et al. Unsupervised monocular visual-inertial odometry network[C]//Proceedings of the 29th International Joint Conferences on Artificial Intelligence, Yokohama, Jan 7-15, 2020: 2347-2354. [36] ZHANG Z, DERICHE R, FAUGERAS O, et al. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry[J]. Artifi-cial Intelligence, 1995, 78(1/2): 87-119. [37] DOSOVITSKIY A, FISCHER P, ILG E, et al. Flownet: lear-ning optical flow with convolutional networks[C]//Procee-dings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Piscataway: IEEE, 2015: 2758-2766. [38] ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks[C]//Procee-dings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Pisca-taway: IEEE, 2017: 1647-1655. [39] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Proce-ssing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008. [40] JIE H, LI S, GANG S. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 2020, 42(8): 2011-2023. [41] WEYTJENS H, WEERDT J D. Process outcome prediction: CNN vs. LSTM (with attention)[C]//Proceedings of the 2020 International Conference on Business Process Mana-gement, Seville, Sep 13-18, 2020. Cham: Springer, 2020: 321-333. [42] DONG N, ZHAO L, WU C H, et al. Inception v3 based cervical cell classification combined with artificially extracted features[J]. Applied Soft Computing, 2020, 93: 106311. [43] XUE F, WANG Q, WANG X, et al. Guided feature selec-tion for deep visual odometry[C]//LNCS 11366: Procee-dings of the 14th Asian Conference on Computer Vision, Perth, Dec 2-6, 2018. Cham: Springer, 2018: 293-308. [44] GEIGER A, LENZ P, URTASUN R. Are we ready for auto-nomous driving? The KITTI vision benchmark suite[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Jun 16-21, 2012. Piscataway: IEEE, 2012: 3354-3361. [45] BARRON J T. A general and adaptive robust loss function[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 4331-4339. [46] GEIGER A, LENZ P, STILLER C, et al. Vision meets robo-tics: the KITTI dataset[J]. International Journal of Robotics Research, 2013, 32(11): 1231-1237. [47] BLANCO-CLARACO J L, MORENO-DUENAS F A, GONZáLEZ-JIMéNEZ J. The Málaga urban dataset: high-rate stereo and LiDAR in a realistic urban scenario[J]. International Journal of Robotics Research, 2014, 33(2): 207-214. [48] CARLEVARIS-BIANCO N, USHANI A K, EUSTICE R M. University of Michigan North Campus long-term vision and LIDAR dataset[J]. The International Journal of Robotics Research, 2016, 35(9): 1023-1035. [49] MAJDIK A L, TILL C, SCARAMUZZA D. The Zurich urban micro aerial vehicle dataset[J]. The International Journal of Robotics Research, 2017, 36(3): 269-273. [50] MILLER M, CHUNG S J, HUTCHINSON S. The visual-inertial canoe dataset[J]. The International Journal of Robotics Research, 2018, 37(1): 13-20. [51] CHEN W, LIU Z, ZHAO H, et al. CUHK-AHU dataset: pro-moting practical self-driving applications in the complex airport logistics, hill and urban environments[C]//Procee-dings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Oct 24, 2020. Piscataway: IEEE, 2020: 4283-4288. [52] SCHUBERT D, GOLL T, DEMMEL N, et al. The TUM VI benchmark for evaluating visual-inertial odometry[C]//Procee-dings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Oct 1-5, 2018. Pisca-taway: IEEE, 2018: 1680-1687. [53] PFROMMER B, SANKET N, DANIILIDIS K, et al. Penn-COSYVIO: a challenging visual inertial odometry bench-mark[C]//Proceedings of the 2017 IEEE International Con-ference on Robotics and Automation, Singapore, May 29-Jun 3, 2017. Piscataway: IEEE, 2017: 3847-3854. [54] REINA S C, SOLIN A, RAHTU E, et al. ADVIO: an authentic dataset for visual-inertial odometry[C]//LNCS 11214: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 425-440. [55] JINYU L, BANGBANG Y, DANPENG C, et al. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality[J]. Virtual Reality & Intelligent Hard-ware, 2019, 1(4): 386-410. [56] WANG C, ZHAO Y, GUO J, et al. NEAR: the NetEase AR oriented visual inertial dataset[C]//Proceedings of the 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct, Beijing, Oct 10-18, 2019. Piscataway: IEEE, 2019: 366-371. [57] ZU?IGA-NO?L D, JAENAL A, GOMEZ-OJEDA R, et al. The UMA-VI dataset: visual-inertial odometry in low-textured and dynamic illumination environments[J]. The International Journal of Robotics Research, 2020, 39(9): 1052-1060. [58] SONG Y, QIAN J, MIAO R, et al. HAUD: a high-accuracy underwater dataset for visual-inertial odometry[C]//Procee-dings of the 20th IEEE Sensors, Australia, Oct 31-Nov 3, 2021. Piscataway: IEEE, 2021: 1-4. [59] BURRI M, NIKOLIC J, GOHL P, et al. The EuRoC micro aerial vehicle datasets[J]. The International Journal of Robo-tics Research, 2016, 35(10): 1157-1163. [60] FERRERA M, CREUZE V, MORAS J, et al. AQUALOC: an underwater dataset for visual-inertial-pressure localiza-tion[J]. The International Journal of Robotics Research, 2019, 38(14): 1549-1559. [61] ANTONINI A, GUERRA W, MURALI V, et al. The black-bird UAV dataset[J]. The International Journal of Robotics Research, 2020, 39(10/11): 1346-1364. [62] CAO L, LING J, XIAO X. The WHU rolling shutter visual-inertial dataset[J]. IEEE Access, 2020, 8: 50771-50779. [63] MINODA K, SCHILLING F, WüEST V, et al. VIODE: a simulated dataset to address the challenges of visual-inertial odometry in dynamic environments[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 1343-1350. [64] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 2017, 39(6): 1137-1149. [65] SUN K, MOHTA K, PFROMMER B. Robust stereo visual inertial odometry for fast autonomous flight[J]. IEEE Robo-tics and Automation Letters, 2018, 3(2): 965-972. [66] LEUTENEGGER S, LYNEN S, BOSSE M, et al. Keyframe-based visual-inertial odometry using nonlinear optimization[J]. The International Journal of Robotics Research, 2015, 34(3): 314-334. |
[1] | 胡硕, 姚美玉, 孙琳娜, 王洁, 周思恩. 融合注意力特征的精确视觉跟踪[J]. 计算机科学与探索, 2023, 17(4): 868-878. |
[2] | 竺笈, 肖晓丽, 尹波, 孙倩, 谈东. 融合用户社会关系的双线性扩散图推荐模型[J]. 计算机科学与探索, 2023, 17(4): 826-836. |
[3] | 祁欣, 袁非牛, 史劲亭, 王贵黔. 多层次特征融合网络的语义分割算法[J]. 计算机科学与探索, 2023, 17(4): 922-932. |
[4] | 黄涛, 李华, 周桂, 李少波, 王阳. 实例分割方法研究综述[J]. 计算机科学与探索, 2023, 17(4): 810-825. |
[5] | 安胜彪, 郭昱岐, 白 宇, 王腾博. 小样本图像分类研究综述[J]. 计算机科学与探索, 2023, 17(3): 511-532. |
[6] | 焦磊, 云静, 刘利民, 郑博飞, 袁静姝. 封闭域深度学习事件抽取方法研究综述[J]. 计算机科学与探索, 2023, 17(3): 533-548. |
[7] | 周燕, 韦勤彬, 廖俊玮, 曾凡智, 冯文婕, 刘翔宇, 周月霞. 自然场景文本检测与端到端识别:深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594. |
[8] | 徐光达, 毛国君. 多层级特征融合的无人机航拍图像目标检测[J]. 计算机科学与探索, 2023, 17(3): 635-645. |
[9] | 沈怀艳, 吴云. 基于MSFA-Net的肝脏CT图像分割方法[J]. 计算机科学与探索, 2023, 17(3): 646-656. |
[10] | 李虹瑾, 彭力. 特征增强的孪生网络高速跟踪算法[J]. 计算机科学与探索, 2023, 17(2): 396-408. |
[11] | 仝航, 杨燕, 江永全. 检测脑电癫痫的多头自注意力机制神经网络[J]. 计算机科学与探索, 2023, 17(2): 442-452. |
[12] | 王燕, 吕艳萍. 混合深度CNN联合注意力的高光谱图像分类[J]. 计算机科学与探索, 2023, 17(2): 385-395. |
[13] | 李明阳, 陈伟, 王珊珊, 黎捷, 田子建, 张帆. 视觉深度学习的三维重建方法综述[J]. 计算机科学与探索, 2023, 17(2): 279-302. |
[14] | 吴欣, 徐红, 林卓胜, 李胜可, 刘慧琳, 冯跃. 深度学习在舌象分类中的研究综述[J]. 计算机科学与探索, 2023, 17(2): 303-323. |
[15] | 王颖洁, 张程烨, 白凤波, 汪祖民, 季长清. 中文命名实体识别研究综述[J]. 计算机科学与探索, 2023, 17(2): 324-341. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||