The balance of exploration and exploitation is one of the focuses of reinforcement learning research. The exploration helps the agent understand the environment more comprehensively and make better decisions while the exploitation helps the agent make current optimal decisions based on its current cognition of the environment. At present, most of the exploration algorithms are only associated with the value function, regardless of the agent’s current cognitive level of the environment, so the efficiency of the exploration is extremely low. Aiming at solving this problem, this paper proposes a reinforcement learning algorithm named RMAX-KNN (reward maximum K-nearest neighbor) based on the adaptive discretization of the state space. The algorithm rewrites the value function according to the level of discretization of the state space and makes the agent explore the environment reasonably, gradually achieving the adaptive discretization of the environmental state space. By combining the exploration with the discretization of the environmental state space, the RMAX-KNN algorithm gradually raises the cognitive level of the agent in terms of the environment and increases the efficiency of exploration. At the same time, this algorithm is proven to be a probably approximately correct (PAC) optimal exploration algorithm theoretically. The simulation experiments in the Benchmark domains show that the RMAX-KNN algorithm can achieve the adaptive discretization of the environmental state space while exploring the environment and developing the optimal strategy.

%U http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.1905023