解决深度探索问题的贝叶斯深度强化学习算法

doi:10.3778/j.issn.1673-9418.1901020

计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (2): 307-316.DOI: 10.3778/j.issn.1673-9418.1901020

解决深度探索问题的贝叶斯深度强化学习算法

杨珉，汪洁

中南大学信息科学与工程学院，长沙 410083

出版日期:2020-02-01 发布日期:2020-02-16

Bayesian Deep Reinforcement Learning Algorithm for Solving Deep Exploration Problems

YANG Min, WANG Jie

College of Information Science and Engineering, Central South University, Changsha 410083, China

Online:2020-02-01 Published:2020-02-16

摘要/Abstract

摘要：

在强化学习领域，如何平衡探索与利用之间的关系是一个难题。近几年提出的强化学习方法主要关注如何结合深度学习技术来提高算法的泛化能力，却忽略探索利用困境这一问题。传统的强化学习方法可以有效解决探索问题，但存在着一定的限制条件：马尔可夫决策过程的状态空间必须是离散并有限的。提出通过贝叶斯方法来提高深度强化算法的探索效率，并将贝叶斯线性回归中计算参数后验分布的方法扩展到人工神经网络等非线性模型中，通过结合Bootstrapped DQN和提出的计算方法得到了贝叶斯自举深度Q网络算法（BBDQN）。最后用两个环境下的实验表明了BBDQN在面对深度探索问题时的探索效率要优于DQN以及Bootstrapped DQN。

关键词: 深度强化学习, 探索与利用, 贝叶斯定理

Abstract:

In the field of reinforcement learning, how to balance the relationship between exploration and exploi-tation is a hard problem. The reinforcement learning method proposed in recent years mainly focuses on how to combine the deep learning technology to improve the generalization ability of the algorithm, but ignores the explo-ration-exploitation dilemma. The traditional reinforcement learning method can effectively solve the exploration problem, but there are certain restrictions: the state space of the Markov decision process must be discrete and limited. In this paper, the Bayesian method is proposed to improve the efficiency of deep reinforcement algorithm. And the main contribution is to extend the method of calculating the posterior distribution of parameters in Bayesian linear regression to nonlinear models such as artificial neural networks. By combining Bootstrapped DQN (deep Q-network) and the computational method proposed in this paper, Bayesian Bootstrapped DQN (BBDQN) is obtained. Finally, the results of the experiments in two environments show that BBDQN is more efficient than DQN and Bootstrapped DQN in the face of deep exploration.

Key words: deep reinforcement learning, exploration and exploitation, Bayes?? theorem

杨珉，汪洁. 解决深度探索问题的贝叶斯深度强化学习算法[J]. 计算机科学与探索, 2020, 14(2): 307-316.

YANG Min, WANG Jie. Bayesian Deep Reinforcement Learning Algorithm for Solving Deep Exploration Problems[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(2): 307-316.

[1]	韩旭, 吴锋. 结合对比预测的离线元强化学习方法[J]. 计算机科学与探索, 2023, 17(8): 1917-1927.
[2]	张立, 段明达, 万剑雄, 李雷孝, 刘楚仪. 车联网区块链吞吐量优化的深度强化学习方法研究[J]. 计算机科学与探索, 2023, 17(7): 1708-1718.
[3]	王扬, 陈智斌, 吴兆蕊, 高远. 强化学习求解组合最优化问题的研究综述[J]. 计算机科学与探索, 2022, 16(2): 261-279.
[4]	赵婷婷，孔乐，韩雅杰，任德华，陈亚瑞. 模型化强化学习研究综述[J]. 计算机科学与探索, 2020, 14(6): 918-927.
[5]	李超，门昌骞，王文剑. PAC最优的RMAX-KNN探索算法[J]. 计算机科学与探索, 2020, 14(3): 513-526.

解决深度探索问题的贝叶斯深度强化学习算法

Bayesian Deep Reinforcement Learning Algorithm for Solving Deep Exploration Problems

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics