计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (9): 2075-2091.DOI: 10.3778/j.issn.1673-9418.2301067
崔铭,龚声蓉
出版日期:
2023-09-01
发布日期:
2023-09-01
CUI Ming, GONG Shengrong
Online:
2023-09-01
Published:
2023-09-01
摘要: 最优决策问题在机器学习领域由来已久。模仿学习从强化学习发展而来,研究如何从专家数据中重建期望策略进而学习最优决策。近年来模仿学习既在理论研究中和计算机视觉有所结合,又在自动驾驶、机器人等应用中取得不错的成效。首先介绍模仿学习的由来及传统的两种研究方法,分别是行为克隆和逆强化学习,随着对抗训练结构的发展,生成对抗模仿学习成为现今的重点研究方向,而对其后续改进工作统称为对抗型模仿学习;其次分析了对抗型模仿学习结合视觉演示的研究内容,并针对存在的次优专家演示样本、少样本、样本利用效率低下等共性问题以及现有的对应改良方案进行总结;然后根据实验结果对比分析不同方法所解决的问题表现;最后说明对抗型视觉模仿学习在实际中的无人驾驶、工业机器人等场景的应用情况,总结并指出未来理论研究方向以及应用前景与挑战。
崔铭, 龚声蓉. 视觉导向的对抗型模仿学习研究综述[J]. 计算机科学与探索, 2023, 17(9): 2075-2091.
CUI Ming, GONG Shengrong. Survey on Visual-Guided Adversarial Imitation Learning[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(9): 2075-2091.
[1] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[J]. IEEE Transactions on Neural Networks, 2018, 9(5): 1054. [2] ABBEEL P, NG A Y. Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the 21st Interna-tional Conference on Machine Learning, Island of Sylt, Jul 4-8, 2004. New York: ACM, 2004: 1-13. [3] ABBEEL P, COATES A, NG A Y. Autonomous helicopter aerobatics through apprenticeship learning[J]. The Interna-tional Journal of Robotics Research, 2010, 29(13): 1608-1639. [4] BROWN D S, NIEKUM S. Machine teaching for inverse reinforcement learning: algorithms and applications[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 7749-7758. [5] POMERLEAU D A. Efficient training of artificial neural networks for autonomous navigation[J]. Neural Computa-tion, 1991, 3(1): 88-97. [6] ROSS S, GORDON G J, BAGNELL J A. A reduction of imitation learning and structured prediction to no-regret online learning[J]. arXiv:1011.0686, 2010. [7] HO J, ERMON S. Generative adversarial imitation learning[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 4565-4573. [8] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Infor-mation Processing Systems 27, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2672-2680. [9] SONG J, REN H, SADIGH D, et al. Multi-agent generative adversarial imitation learning[C]//Advances in Neural Infor-mation Processing Systems 31, Montréal, Dec 3-8, 2018: 31-42. [10] KIM K, GU Y, SONG J, et al. Domain adaptive imitation learning[C]//Proceedings of the 37th International Confere-nce on Machine Learning, Jul 13-18, 2020: 5286-5295. [11] KINGMA D P, BA J. Adam: a method for stochastic opti-mization[J]. arXiv:1412.6980, 2014. [12] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017. [14] CHEN X, DUAN Y, HOUTHOOFT R, et al. InfoGAN: interpretable representation learning by information maxi-mizing generative adversarial nets[C]//Advances in Neural Information Processing Systems 29, Barcelona, Dec 5-10, 2016: 2172-2180. [15] LI Y, SONG J, ERMON S. InfoGAIL: interpretable imita-tion learning from visual demonstrations[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach Convention Center, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 9680-9690. [16] MIRZA M, OSINDERO S. Conditional generative adversa-rial nets[J]. arXiv:1411.1784, 2014. [17] ZHANG X, LI Y, ZHOU X, et al. cGAIL: conditional generative adversarial imitation learning—an application in taxi drivers?? strategy learning[J]. IEEE Transactions on Big Data, 2022, 8(5): 1288-1300. [18] SHARMA M, SHARMA A, RHINEHART N, et al. Directed-Info GAIL: learning hierarchical policies from unsegmen-ted demonstrations using directed information[C]//Procee-dings of the 7th International Conference on Learning Representations, New Orleans, May 6-9, 2019: 342-356. [19] NOWOZIN S, CSEKE B, TOMIOKA R. f-GAN: training generative neural samplers using variational divergence minimization[C]//Advances in Neural Information Proces-sing Systems 29, Barcelona, Dec 5-10, 2016: 271-279. [20] LIN J. Divergence measures based on the Shannon entropy[J]. IEEE Transactions on Information Theory, 1991, 37(1): 145-151. [21] ZHANG X, LI Y, ZHANG Z, et al. ?-GAIL: learning ?-divergence for generative adversarial imitation learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, Dec 6-12, 2020: 12805-12815. [22] ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 214-223. [23] LACOTTE J, GHAVAMZADEH M, CHOW Y, et al. Risk-sensitive generative adversarial imitation learning[C]//Procee-dings of the 22nd International Conference on Artificial Intelligence and Statistics, Naha, Apr 16-18, 2019: 2154-2163. [24] SERMANET P, XU K, LEVINE S. Unsupervised percep-tual rewards for imitation learning[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-15. [25] STADIE B C, ABBEEL P, SUTSKEVER I. Third person imitation learning[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-16. [26] VOZNIAK I, KLUSCH M, ANTAKLI A, et al. InfoSalGAIL: visual attention-empowered imitation learning of pedestrian behavior in critical traffic scenarios[C]//Proceedings of the 12th International Joint Conference on Computational Intel-ligence, Budapest, Nov 2-4, 2020. Hungary: SciTePress, 2020: 325-337. [27] RAFAILOV R, YU T, RAJESWARAN A, et al. Visual adversarial imitation learning using variational models[C]// Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 3016-3028. [28] LIU Y, GUPTA A, ABBEEL P, et al. Imitation from observa-tion: learning to imitate behaviors from raw video via context translation[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 1118-1125. [29] TORABI F, WARNELL G, STONE P. Behavioral cloning from observation[J]. arXiv:1805.01954, 2018. [30] BROWN D, GOO W, NAGARAJAN P, et al. Extrapolating beyond suboptimal demonstrations via inverse reinforce-ment learning from observations[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 783-792. [31] FAN Y, CHU S, ZHANG W, et al. Learn by observation: imitation learning for drone patrolling from videos of a human navigator[C]//Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Oct 24, 2020. Piscataway: IEEE, 2020: 5209-5216. [32] SHARMA P, PATHAK D, GUPTA A. Third-person visual imitation learning via decoupled hierarchical controller[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 32-45. [33] CHOI S, LEE K, OH S. Robust learning from demonstrations with mixed qualities using leveraged Gaussian processes[J]. IEEE Transactions on Robotics, 2019, 35(3): 564-576. [34] WU Y H, CHAROENPHAKDEE N, BAO H, et al. Imita-tion learning from imperfect demonstration[C]//Procee-dings of the 2019 International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 6818-6827. [35] WANG Y, XU C, DU B, et al. Learning to weight imperfect demonstrations[C]//Proceedings of the 2021 International Conference on Machine Learning, Jul 18-24, 2021: 10961-10970. [36] ZUO G, ZHAO Q, HUANG S, et al. Adversarial imitation learning with mixed demonstrations from multiple demon-strators[J]. Neurocomputing, 2021, 457: 365-376. [37] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th Interna-tional Conference on Machine Learning, Stockholmsm?ssan, Jul 20-15, 2018. Stockholm: PMLR, 2018: 1856-1865. [38] WEN C, LIN J, QIAN J, et al. Keyframe-focused visual imitation learning[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsm?ssan, Jul 10-15, 2018: 5981-5990. [39] BLEI D M, KUCUKELBIR A, MCAULIFFE J D. Variational inference: a review for statisticians[J]. Journal of the American Statistical Association, 2017, 112(518): 859-877. [40] TANGKARATT V, HAN B, KHAN M E, et al. VILD: variational imitation learning with diverse-quality demon-strations[J]. arXiv:1909.06769, 2019. [41] LI K, CHAPPELL D, ROJAS N. Immersive demonstrations are the key to imitation learning[J]. arXiv:2301.09157, 2023. [42] DUAN Y, ANDRYCHOWICZ M, STADIE B, et al. One-shot imitation learning[C]//Proceedings of the 31st Interna-tional Conference on Neural Information Processing Systems, Long Beach , Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 1087-1098. [43] FINN C, YU T, ZHANG T, et al. One-shot visual imitation learning via meta-learning[C]//Proceedings of the 2017 Confe-rence on Robot Learning, Mountain View, Nov 13-15, 2017: 357-368. [44] YUAN M, PUN M. Exploring beyond-demonstrator via meta learning-based reward extrapolation[C]//Proceedings of the 2022 IEEE Symposium Series on Computational Intelligence, Honolulu, Dec 6-9, 2022. Piscataway: IEEE, 2022: 1545-1550. [45] ANTOTSIOU D, CILIBERTO C, KIM T K. Adversarial imitation learning with trajectorial augmentation and correc-tion[C]//Proceedings of the 2021 IEEE International Con-ference on Robotics and Automation, Xi??an, May 30-Jun 5, 2021. Piscataway: IEEE, 2021: 4724-4730. [46] LIBARDI G, DE FABRITIIS G, DITTERT S. Guided exploration with proximal policy optimization using a single demonstration[C]//Proceedings of the 2021 Interna-tional Conference on Machine Learning, Jul 18-24, 2021: 6611-6620. [47] MANDI Z, LIU F, LEE K, et al. Towards more genera-lizable one-shot visual imitation learning[C]//Proceedings of the 2022 International Conference on Robotics and Automation, Philadelphia, May 23-27, 2022. Piscataway: IEEE, 2022: 2434-2444. [48] JENA R, LIU C, SYCARA K. Augmenting GAIL with BC for sample efficient imitation learning[C]//Proceedings of the Conference on Robot Learning, Oct 25-27, 2021: 80-90. [49] BARAM N, ANSCHEL O, MANNOR S. Model-based adversarial imitation learning[J]. arXiv:1612.02179, 2016. [50] 姜冲, 章宗长, 陈子璇, 等. 一种数据高效的第三人称模仿学习方法[J]. 计算机科学, 2021, 48(2): 238-244. JIANG C, ZHANG Z C, CHEN Z X, et al. Data efficient third-person imitation learning method[J]. Computer Science, 2021, 48(2): 238-244. [51] CHAUDHURY S, KIMURA D, MUNAWAR A, et al. Injective state-image mapping facilitates visual adversarial imitation learning[C]//Proceedings of the 2019 IEEE 21st International Workshop on Multimedia Signal Processing, Kuala Lumpur, Sep 27-29, 2019. Piscataway: IEEE, 2019: 1-6. [52] FEI C, WANG B, ZHUANG Y, et al. Triple-GAIL: a multi-modal imitation learning framework with generative adver-sarial nets[C]//Proceedings of the 29th International Joint Conferences on Artificial Intelligence, 2021: 2929-2935. [53] HO J, GUPTA J, ERMON S. Model-free imitation learning with policy optimization[C]//Proceedings of the 2016 Inter-national Conference on Machine Learning, New York, Jun 19-24, 2016: 2760-2769. [54] NAGABANDI A, KAHN G, FEARING R S, et al. Neural network dynamics for model-based deep reinforcement lear-ning with model-free fine-tuning[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Auto-mation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 7559-7566. [55] CETIN E, CELIKTUTAN O. Domain-robust visual imitation learning with mutual information constraints[C]//Proceedings of the 2021 International Conference on Learning Represen-tations, May 4-8, 2021: 1442-1454. [56] ZHANG X, LI Y, ZHOU X, et al. TrajGAIL: trajectory generative adversarial imitation learning for long-term deci-sion analysis[C]//Proceedings of the 2020 IEEE Interna-tional Conference on Data Mining, Sorrento, Dec 17-20, 2020. Piscataway: IEEE, 2020: 801-810. [57] YANG S, ZHANG W, LU W, et al. Cross-context visual imitation learning from demonstrations[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation, Paris, May 15-19, 2020. Piscataway: IEEE, 2020: 5467-5473. [58] TODOROV E, EREZ T, TASSA Y. MuJoCo: a physics engine for model-based control[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Oct 7-12, 2012. Piscataway: IEEE, 2012: 5026-5033. [59] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 2015 Internatio-nal Conference on Machine Learning, Lille, Jul 6-11, 2015: 1889-1897. [60] PENG X B, ABBEEL P, LEVINE S, et al. DeepMimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics, 2018, 37(4): 1-14. [61] SERMANET P, LYNCH C, CHEBOTAR Y, et al. Time-contrastive networks: self-supervised learning from video[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, May 21-25, 2018. Piscataway: IEEE, 2018: 1134-1141. [62] BROWN D S, GOO W, NIEKUM S. Better-than-demon-strator imitation learning via automatically-ranked demon-strations[C]//Proceedings of the 2020 Conference on Robot Learning, Nov 18-20, 2020: 330-359. [63] BHATTACHARYYA R, WULFE B, PHILLIPS D, et al. Modeling human driving behavior through generative adver-sarial imitation learning[J]. IEEE Transactions on Intelli-gent Transportation Systems, 2023, 24(3): 2874-2887. [64] PAN M, ZHANG X, LI Y, et al. Learning decision making strategies of non-experts: a NEXT-GAIL model for taxi drivers[C]//Proceedings of the 29th International Conference on Advances in Geographic Information Systems, Beijing, Nov 2-5, 2021. New York: ACM, 2021: 149-158. [65] JIN J, PETRICH L, DEHGHAN M, et al. A geometric perspective on visual imitation learning[C]//Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Oct 25-29, 2020. Piscata-way: IEEE, 2020: 5194-5200. [66] PAN X, ZHANG T, ICHTER B, et al. Zero-shot imitation learning from demonstrations for legged robot visual navigation[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation, Paris, May 31-Jun 4, 2020. Piscataway: IEEE, 2020: 679-685. [67] WU Q, GONG X, XU K, et al. Towards target-driven visual navigation in indoor scenes via generative imitation learning[J]. IEEE Robotics and Automation Letters, 2020, 6(1): 175-182. [68] 蒙飞, 张越, 王运, 等. 基于生成对抗模仿学习的电力系统动态经济调度系统及方法[J]. 电网技术, 2022, 46(11): 8-15. MENG F, ZHANG Y, WANG Y, et al. Dynamic economic dispatch of power system based on generative adversarial imitation learning[J]. Power System Technology, 2022, 46(11): 8-15. [69] 郝少璞, 刘全, 徐平安, 等. 基于余弦相似度的多模态模仿学习方法[J]. 计算机研究与发展, 2023, 60(6): 1358-1372. HAO S P, LIU Q, XU P A, et al. Multi-modal imitation learning method with cosine similarity[J]. Journal of Com-puter Research and Development, 2023, 60(6): 1358-1372. [70] 钟珊, 刘全, 傅启明, 等. 一种采用模型学习和经验回放加速的正则化自然行动器评判器算法[J]. 计算机学报, 2019, 42(3): 532-553. ZHONG S, LIU Q, FU Q M, et al. A regularized natural AC algorithm with the acceleration of model learning and experience replay[J]. Chinese Journal of Computers, 2019, 42(3): 532-553. [71] 申栩林, 李超波, 李洪均. 人群密集度下GAN的视频异常行为检测进展[J]. 计算机工程与应用, 2022, 58(7): 21-30. SHEN X L, LI C B, LI H J. Overview on video abnormal behavior detection of GAN via human density[J]. Com-puter Engineering and Applications, 2022, 58(7): 21-30. [72] 王照乾, 孔韦韦, 滕金保, 等. DenseNet生成对抗网络低照度图像增强方法[J]. 计算机工程与应用, 2022, 58(8): 214-220. WANG Z Q, KONG W W, TENG J B, et al. Low illumina-tion image enhancement method based on DenseNet GAN[J]. Computer Engineering and Applications, 2022, 58(8): 214-220. |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||