计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (3): 513-526.DOI: 10.3778/j.issn.1673-9418.1905023

• 理论与算法 • 上一篇    下一篇

PAC最优的RMAX-KNN探索算法

李超,门昌骞,王文剑   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.计算智能与中文信息处理教育部重点实验室(山西大学),太原 030006
  • 出版日期:2020-03-01 发布日期:2020-03-13

PAC Optimal Exploration Algorithm Named RMAX-KNN

LI Chao, MEN Changqian, WANG Wenjian   

  1. 1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2.Key Laboratory of Computational Intelligence and Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006, China
  • Online:2020-03-01 Published:2020-03-13

摘要:

探索与利用的均衡是强化学习研究的重点之一。探索帮助智能体进一步了解环境来做出更优决策;而利用帮助智能体根据其自身当前对于环境的认知来做出当前最优决策。目前大多数探索算法只与值函数相关联,不考虑当前智能体对于环境的认知程度,探索效率极低。针对此问题,提出了一种基于状态空间自适应离散化的RMAX-KNN强化学习算法,算法根据当前智能体对于环境状态空间的离散化程度改写值函数形式,然后基于此值函数对环境进行合理的探索,逐步实现对于环境状态空间的自适应离散化划分。RMAX-KNN算法通过将探索与环境状态空间离散化相结合,逐渐加深智能体对于环境的认知程度,进而提高探索效率,同时在理论上证明该算法是一种概率近似正确(PAC)最优探索算法。在Benchmark环境上的仿真实验结果表明,RMAX-KNN算法可以在探索环境的同时实现对于环境状态空间的自适应离散化,并学习到最优策略。

关键词: 探索与利用的均衡, 值函数, 状态空间自适应离散化, 概率近似正确(PAC)最优探索算法

Abstract:

The balance of exploration and exploitation is one of the focuses of reinforcement learning research. The exploration helps the agent understand the environment more comprehensively and make better decisions while the exploitation helps the agent make current optimal decisions based on its current cognition of the environment. At present, most of the exploration algorithms are only associated with the value function, regardless of the agent’s current cognitive level of the environment, so the efficiency of the exploration is extremely low. Aiming at solving this problem, this paper proposes a reinforcement learning algorithm named RMAX-KNN (reward maximum K-nearest neighbor) based on the adaptive discretization of the state space. The algorithm rewrites the value function according to the level of discretization of the state space and makes the agent explore the environment reasonably, gradually achieving the adaptive discretization of the environmental state space. By combining the exploration with the discretization of the environmental state space, the RMAX-KNN algorithm gradually raises the cognitive level of the agent in terms of the environment and increases the efficiency of exploration. At the same time, this algorithm is proven to be a probably approximately correct (PAC) optimal exploration algorithm theoretically. The simulation  experiments in the Benchmark domains show that the RMAX-KNN algorithm can achieve the adaptive discretization of the environmental state space while exploring the environment and developing the optimal strategy.

Key words: balance of exploration and exploitation, value function, adaptive discretization of state space, probably approximately correct (PAC) optimal exploration algorithm