视觉导向的对抗型模仿学习研究综述

doi:10.3778/j.issn.1673-9418.2301067

摘要/Abstract

摘要： 最优决策问题在机器学习领域由来已久。模仿学习从强化学习发展而来，研究如何从专家数据中重建期望策略进而学习最优决策。近年来模仿学习既在理论研究中和计算机视觉有所结合，又在自动驾驶、机器人等应用中取得不错的成效。首先介绍模仿学习的由来及传统的两种研究方法，分别是行为克隆和逆强化学习，随着对抗训练结构的发展，生成对抗模仿学习成为现今的重点研究方向，而对其后续改进工作统称为对抗型模仿学习；其次分析了对抗型模仿学习结合视觉演示的研究内容，并针对存在的次优专家演示样本、少样本、样本利用效率低下等共性问题以及现有的对应改良方案进行总结；然后根据实验结果对比分析不同方法所解决的问题表现；最后说明对抗型视觉模仿学习在实际中的无人驾驶、工业机器人等场景的应用情况，总结并指出未来理论研究方向以及应用前景与挑战。

关键词: 模仿学习, 行为克隆, 逆强化学习, 对抗模仿学习

Abstract: The problem of optimal decision has a long history in the field of machine learning. Imitation learning, originating from reinforcement learning, is studied to reconstruct the expected policy from expert data and learn the optimal decision-making. In recent years, imitation learning has been successfully applied in both theoretical research and computer vision, as well as in various applications such as autonomous driving and robotics. The origin of imitation learning and the two traditional research methods, namely behavior cloning and inverse reinforcement learning, are introduced. With the development of adversarial training structures, generative adversarial imitation learning has become a key research direction, and its subsequent improvement work is collectively referred to as adversarial imitation learning. The research content of adversarial imitation learning combined with visual demonstrations is analyzed, along with summarizing common issues like suboptimal expert demonstrations, limited data, and low sample utilization efficiency, and the existing corresponding solutions. Then, the performance of different methods in addressing these problems is compared and analyzed based on experimental results. Finally, practical applications of adversarial visual imitation learning in scenarios such as autonomous driving and industrial robotics are discussed, and this paper is concluded by pointing out future research directions, as well as the potential prospects and challenges in applications.

Key words: imitation learning, behavior cloning, inverse reinforcement learning, adversarial imitation learning

崔铭, 龚声蓉. 视觉导向的对抗型模仿学习研究综述[J]. 计算机科学与探索, 2023, 17(9): 2075-2091.

CUI Ming, GONG Shengrong. Survey on Visual-Guided Adversarial Imitation Learning[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(9): 2075-2091.

参考文献

[1] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[J]. IEEE Transactions on Neural Networks, 2018, 9(5): 1054.
[2] ABBEEL P, NG A Y. Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the 21st Interna-tional Conference on Machine Learning, Island of Sylt, Jul 4-8, 2004. New York: ACM, 2004: 1-13.
[3] ABBEEL P, COATES A, NG A Y. Autonomous helicopter aerobatics through apprenticeship learning[J]. The Interna-tional Journal of Robotics Research, 2010, 29(13): 1608-1639.
[4] BROWN D S, NIEKUM S. Machine teaching for inverse reinforcement learning: algorithms and applications[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 7749-7758.
[5] POMERLEAU D A. Efficient training of artificial neural networks for autonomous navigation[J]. Neural Computa-tion, 1991, 3(1): 88-97.
[6] ROSS S, GORDON G J, BAGNELL J A. A reduction of imitation learning and structured prediction to no-regret online learning[J]. arXiv:1011.0686, 2010.
[7] HO J, ERMON S. Generative adversarial imitation learning[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 4565-4573.
[8] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Infor-mation Processing Systems 27, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2672-2680.
[9] SONG J, REN H, SADIGH D, et al. Multi-agent generative adversarial imitation learning[C]//Advances in Neural Infor-mation Processing Systems 31, Montréal, Dec 3-8, 2018: 31-42.
[10] KIM K, GU Y, SONG J, et al. Domain adaptive imitation learning[C]//Proceedings of the 37th International Confere-nce on Machine Learning, Jul 13-18, 2020: 5286-5295.
[11] KINGMA D P, BA J. Adam: a method for stochastic opti-mization[J]. arXiv:1412.6980, 2014.
[12] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017.
[14] CHEN X, DUAN Y, HOUTHOOFT R, et al. InfoGAN: interpretable representation learning by information maxi-mizing generative adversarial nets[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016: 2172-2180.
[15] LI Y, SONG J, ERMON S. InfoGAIL: interpretable imita-tion learning from visual demonstrations[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach Convention Center, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 9680-9690.
[16] MIRZA M, OSINDERO S. Conditional generative adversa-rial nets[J]. arXiv:1411.1784, 2014.
[17] ZHANG X, LI Y, ZHOU X, et al. cGAIL: conditional generative adversarial imitation learning—an application in taxi drivers?? strategy learning[J]. IEEE Transactions on Big Data, 2022, 8(5): 1288-1300.
[18] SHARMA M, SHARMA A, RHINEHART N, et al. Directed-Info GAIL: learning hierarchical policies from unsegmen-ted demonstrations using directed information[C]//Procee-dings of the 7th International Conference on Learning Representations, New Orleans, May 6-9, 2019: 342-356.
[19] NOWOZIN S, CSEKE B, TOMIOKA R. f-GAN: training generative neural samplers using variational divergence minimization[C]//Advances in Neural Information Proces-sing Systems 29, Barcelona, Dec 5-10, 2016: 271-279.
[20] LIN J. Divergence measures based on the Shannon entropy[J]. IEEE Transactions on Information Theory, 1991, 37(1): 145-151.
[21] ZHANG X, LI Y, ZHANG Z, et al. ?-GAIL: learning ?-divergence for generative adversarial imitation learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, Dec 6-12, 2020: 12805-12815.
[22] ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 214-223.
[23] LACOTTE J, GHAVAMZADEH M, CHOW Y, et al. Risk-sensitive generative adversarial imitation learning[C]//Procee-dings of the 22nd International Conference on Artificial Intelligence and Statistics, Naha, Apr 16-18, 2019: 2154-2163.
[24] SERMANET P, XU K, LEVINE S. Unsupervised percep-tual rewards for imitation learning[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-15.
[25] STADIE B C, ABBEEL P, SUTSKEVER I. Third person imitation learning[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-16.
[26] VOZNIAK I, KLUSCH M, ANTAKLI A, et al. InfoSalGAIL: visual attention-empowered imitation learning of pedestrian behavior in critical traffic scenarios[C]//Proceedings of the 12th International Joint Conference on Computational Intel-ligence, Budapest, Nov 2-4, 2020. Hungary: SciTePress, 2020: 325-337.
[27] RAFAILOV R, YU T, RAJESWARAN A, et al. Visual adversarial imitation learning using variational models[C]// Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 3016-3028.
[28] LIU Y, GUPTA A, ABBEEL P, et al. Imitation from observa-tion: learning to imitate behaviors from raw video via context translation[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 1118-1125.
[29] TORABI F, WARNELL G, STONE P. Behavioral cloning from observation[J]. arXiv:1805.01954, 2018.
[30] BROWN D, GOO W, NAGARAJAN P, et al. Extrapolating beyond suboptimal demonstrations via inverse reinforce-ment learning from observations[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 783-792.
[31] FAN Y, CHU S, ZHANG W, et al. Learn by observation: imitation learning for drone patrolling from videos of a human navigator[C]//Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Oct 24, 2020. Piscataway: IEEE, 2020: 5209-5216.
[32] SHARMA P, PATHAK D, GUPTA A. Third-person visual imitation learning via decoupled hierarchical controller[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 32-45.
[33] CHOI S, LEE K, OH S. Robust learning from demonstrations with mixed qualities using leveraged Gaussian processes[J]. IEEE Transactions on Robotics, 2019, 35(3): 564-576.
[34] WU Y H, CHAROENPHAKDEE N, BAO H, et al. Imita-tion learning from imperfect demonstration[C]//Procee-dings of the 2019 International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 6818-6827.
[35] WANG Y, XU C, DU B, et al. Learning to weight imperfect demonstrations[C]//Proceedings of the 2021 International Conference on Machine Learning, Jul 18-24, 2021: 10961-10970.
[36] ZUO G, ZHAO Q, HUANG S, et al. Adversarial imitation learning with mixed demonstrations from multiple demon-strators[J]. Neurocomputing, 2021, 457: 365-376.
[37] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th Interna-tional Conference on Machine Learning, Stockholmsm?ssan, Jul 20-15, 2018. Stockholm: PMLR, 2018: 1856-1865.
[38] WEN C, LIN J, QIAN J, et al. Keyframe-focused visual imitation learning[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsm?ssan, Jul 10-15, 2018: 5981-5990.
[39] BLEI D M, KUCUKELBIR A, MCAULIFFE J D. Variational inference: a review for statisticians[J]. Journal of the American Statistical Association, 2017, 112(518): 859-877.
[40] TANGKARATT V, HAN B, KHAN M E, et al. VILD: variational imitation learning with diverse-quality demon-strations[J]. arXiv:1909.06769, 2019.
[41] LI K, CHAPPELL D, ROJAS N. Immersive demonstrations are the key to imitation learning[J]. arXiv:2301.09157, 2023.
[42] DUAN Y, ANDRYCHOWICZ M, STADIE B, et al. One-shot imitation learning[C]//Proceedings of the 31st Interna-tional Conference on Neural Information Processing Systems, Long Beach , Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 1087-1098.
[43] FINN C, YU T, ZHANG T, et al. One-shot visual imitation learning via meta-learning[C]//Proceedings of the 2017 Confe-rence on Robot Learning, Mountain View, Nov 13-15, 2017: 357-368.
[44] YUAN M, PUN M. Exploring beyond-demonstrator via meta learning-based reward extrapolation[C]//Proceedings of the 2022 IEEE Symposium Series on Computational Intelligence, Honolulu, Dec 6-9, 2022. Piscataway: IEEE, 2022: 1545-1550.
[45] ANTOTSIOU D, CILIBERTO C, KIM T K. Adversarial imitation learning with trajectorial augmentation and correc-tion[C]//Proceedings of the 2021 IEEE International Con-ference on Robotics and Automation, Xi??an, May 30-Jun 5, 2021. Piscataway: IEEE, 2021: 4724-4730.
[46] LIBARDI G, DE FABRITIIS G, DITTERT S. Guided exploration with proximal policy optimization using a single demonstration[C]//Proceedings of the 2021 Interna-tional Conference on Machine Learning, Jul 18-24, 2021: 6611-6620.
[47] MANDI Z, LIU F, LEE K, et al. Towards more genera-lizable one-shot visual imitation learning[C]//Proceedings of the 2022 International Conference on Robotics and Automation, Philadelphia, May 23-27, 2022. Piscataway: IEEE, 2022: 2434-2444.
[48] JENA R, LIU C, SYCARA K. Augmenting GAIL with BC for sample efficient imitation learning[C]//Proceedings of the Conference on Robot Learning, Oct 25-27, 2021: 80-90.
[49] BARAM N, ANSCHEL O, MANNOR S. Model-based adversarial imitation learning[J]. arXiv:1612.02179, 2016.
[50] 姜冲, 章宗长, 陈子璇, 等. 一种数据高效的第三人称模仿学习方法[J]. 计算机科学, 2021, 48(2): 238-244.
JIANG C, ZHANG Z C, CHEN Z X, et al. Data efficient third-person imitation learning method[J]. Computer Science, 2021, 48(2): 238-244.
[51] CHAUDHURY S, KIMURA D, MUNAWAR A, et al. Injective state-image mapping facilitates visual adversarial imitation learning[C]//Proceedings of the 2019 IEEE 21st International Workshop on Multimedia Signal Processing, Kuala Lumpur, Sep 27-29, 2019. Piscataway: IEEE, 2019: 1-6.
[52] FEI C, WANG B, ZHUANG Y, et al. Triple-GAIL: a multi-modal imitation learning framework with generative adver-sarial nets[C]//Proceedings of the 29th International Joint Conferences on Artificial Intelligence, 2021: 2929-2935.
[53] HO J, GUPTA J, ERMON S. Model-free imitation learning with policy optimization[C]//Proceedings of the 2016 Inter-national Conference on Machine Learning, New York, Jun 19-24, 2016: 2760-2769.
[54] NAGABANDI A, KAHN G, FEARING R S, et al. Neural network dynamics for model-based deep reinforcement lear-ning with model-free fine-tuning[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Auto-mation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 7559-7566.
[55] CETIN E, CELIKTUTAN O. Domain-robust visual imitation learning with mutual information constraints[C]//Proceedings of the 2021 International Conference on Learning Represen-tations, May 4-8, 2021: 1442-1454.
[56] ZHANG X, LI Y, ZHOU X, et al. TrajGAIL: trajectory generative adversarial imitation learning for long-term deci-sion analysis[C]//Proceedings of the 2020 IEEE Interna-tional Conference on Data Mining, Sorrento, Dec 17-20, 2020. Piscataway: IEEE, 2020: 801-810.
[57] YANG S, ZHANG W, LU W, et al. Cross-context visual imitation learning from demonstrations[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation, Paris, May 15-19, 2020. Piscataway: IEEE, 2020: 5467-5473.
[58] TODOROV E, EREZ T, TASSA Y. MuJoCo: a physics engine for model-based control[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Oct 7-12, 2012. Piscataway: IEEE, 2012: 5026-5033.
[59] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 2015 Internatio-nal Conference on Machine Learning, Lille, Jul 6-11, 2015: 1889-1897.
[60] PENG X B, ABBEEL P, LEVINE S, et al. DeepMimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics, 2018, 37(4): 1-14.
[61] SERMANET P, LYNCH C, CHEBOTAR Y, et al. Time-contrastive networks: self-supervised learning from video[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 1134-1141.
[62] BROWN D S, GOO W, NIEKUM S. Better-than-demon-strator imitation learning via automatically-ranked demon-strations[C]//Proceedings of the 2020 Conference on Robot Learning, Nov 18-20, 2020: 330-359.
[63] BHATTACHARYYA R, WULFE B, PHILLIPS D, et al. Modeling human driving behavior through generative adver-sarial imitation learning[J]. IEEE Transactions on Intelli-gent Transportation Systems, 2023, 24(3): 2874-2887.
[64] PAN M, ZHANG X, LI Y, et al. Learning decision making strategies of non-experts: a NEXT-GAIL model for taxi drivers[C]//Proceedings of the 29th International Conference on Advances in Geographic Information Systems, Beijing, Nov 2-5, 2021. New York: ACM, 2021: 149-158.
[65] JIN J, PETRICH L, DEHGHAN M, et al. A geometric perspective on visual imitation learning[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Oct 25-29, 2020. Piscata-way: IEEE, 2020: 5194-5200.
[66] PAN X, ZHANG T, ICHTER B, et al. Zero-shot imitation learning from demonstrations for legged robot visual navigation[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation, Paris, May 31-Jun 4, 2020. Piscataway: IEEE, 2020: 679-685.
[67] WU Q, GONG X, XU K, et al. Towards target-driven visual navigation in indoor scenes via generative imitation learning[J]. IEEE Robotics and Automation Letters, 2020, 6(1): 175-182.
[68] 蒙飞, 张越, 王运, 等. 基于生成对抗模仿学习的电力系统动态经济调度系统及方法[J]. 电网技术, 2022, 46(11): 8-15.
MENG F, ZHANG Y, WANG Y, et al. Dynamic economic dispatch of power system based on generative adversarial imitation learning[J]. Power System Technology, 2022, 46(11): 8-15.
[69] 郝少璞, 刘全, 徐平安, 等. 基于余弦相似度的多模态模仿学习方法[J]. 计算机研究与发展, 2023, 60(6): 1358-1372.
HAO S P, LIU Q, XU P A, et al. Multi-modal imitation learning method with cosine similarity[J]. Journal of Com-puter Research and Development, 2023, 60(6): 1358-1372.
[70] 钟珊, 刘全, 傅启明, 等. 一种采用模型学习和经验回放加速的正则化自然行动器评判器算法[J]. 计算机学报, 2019, 42(3): 532-553.
ZHONG S, LIU Q, FU Q M, et al. A regularized natural AC algorithm with the acceleration of model learning and experience replay[J]. Chinese Journal of Computers, 2019, 42(3): 532-553.
[71] 申栩林, 李超波, 李洪均. 人群密集度下GAN的视频异常行为检测进展[J]. 计算机工程与应用, 2022, 58(7): 21-30.
SHEN X L, LI C B, LI H J. Overview on video abnormal behavior detection of GAN via human density[J]. Com-puter Engineering and Applications, 2022, 58(7): 21-30.
[72] 王照乾, 孔韦韦, 滕金保, 等. DenseNet生成对抗网络低照度图像增强方法[J]. 计算机工程与应用, 2022, 58(8): 214-220.
WANG Z Q, KONG W W, TENG J B, et al. Low illumina-tion image enhancement method based on DenseNet GAN[J]. Computer Engineering and Applications, 2022, 58(8): 214-220.