计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (6): 918-927.DOI: 10.3778/j.issn.1673-9418.1912040

• 综述·探索 • 上一篇    下一篇

模型化强化学习研究综述

赵婷婷,孔乐,韩雅杰,任德华,陈亚瑞   

  1. 天津科技大学 人工智能学院,天津 300467
  • 出版日期:2020-06-01 发布日期:2020-06-04

Review of Model-Based Reinforcement Learning

ZHAO Tingting, KONG Le, HAN Yajie, REN Dehua, CHEN Yarui   

  1. College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300467, China
  • Online:2020-06-01 Published:2020-06-04

摘要:

深度强化学习(DRL)作为机器学习的重要分支,在AlphaGo击败人类后受到了广泛关注。DRL以一种试错机制与环境进行交互,并通过最大化累积奖赏最终得到最优策略。强化学习可分为无模型强化学习和模型化强化学习。无模型强化学习方法的训练过程需要大量样本,当采样预算不足,无法收集大量样本时,很难达到预期效果。然而,模型化强化学习可以充分利用环境模型,降低真实样本需求量,在一定程度上提高样本效率。将以模型化强化学习为核心,介绍该领域的研究现状,分析其经典算法,并探讨未来的发展趋势和应用前景。

关键词: 深度强化学习(DRL), 模型化强化学习, 状态转移模型, 样本利用率

Abstract:

Deep reinforcement learning (DRL) as an important learning paradigm in the field of machine learning, has received increasing attentions after AlphaGo defeats the human. DRL interacts with the environment by trials and errors, and obtains the optimal policy by maximizing the cumulative reward. Reinforcement learning can be divided into two categories: model-free reinforcement learning and model-based reinforcement learning. The tra-ining process of model-free reinforcement learning needs a large number of samples. It is difficult for model-free reinforcement learning to get good performance when the sampling budget is limited, and a large number of samples cannot be collected. However, model-based reinforcement learning can reduce the real sample demand and improve the data efficiency through making full use of the environment model. This paper focuses on the field of model-based reinforcement learning, introduces its research status, investigates its classical algorithms, and discusses?future development trend and application prospect.

Key words: deep reinforcement learning (DRL), model-based reinforcement learning, state transition model, sample efficiency