[1] |
AGOSTINELLI F, HOCQUET G, SINGH S, et al. From reinforcement learning to deep reinforcement learning: an overview[C]// Proceedings of the International Conference Commemorating the 40th Anniversary of Emmanuil Braverman’s Decease, 2018: 298-328.
|
[2] |
刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27.
|
|
LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1-27.
|
[3] |
DA SILVA F L, COSTA A H R. A survey on transfer learning for multiagent reinforcement learning systems[J]. Journal of Artificial Intelligence Research, 2019, 64: 645-703.
DOI
URL
|
[4] |
SONG J, GAO Y, WANG H, et al. Measuring the distance between finite Markov decision processes[C]// Proceedings of the 15th International Conference on Autonomous Agents and Multi-Agent Systems, Singapore, May 9-13, 2016. New York: ACM, 2016: 468-476.
|
[5] |
LIU Y, HU Y, GAO Y, et al. Value function transfer for deep multi-agent reinforcement learning based on N-step returns[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, Aug 10-16, 2019. San Francisco: Morgan Kaufmann, 2019: 457-463.
|
[6] |
LI S, ZHANG C. An optimal online method of selecting source policies for reinforcement learning[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 3562-3570.
|
[7] |
YANG T, HAO J, MENG Z, et al. Efficient deep reinforcement learning via adaptive policy transfer[C]// Proceedings of the 29th International Joint Conference on Artificial Intelligence, Yokohama, Jan 7-15, 2020. San Francisco: Morgan Kaufmann, 2020: 3094-3100.
|
[8] |
SUTTON R S, PRECUP D, SINGH S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinfor-cement learning[J]. Artificial Intelligence, 1999, 112(1/2): 181-211.
DOI
URL
|
[9] |
周志华. 集成学习: 基础与算法[M]. 北京: 电子工业出版社, 2020.
|
|
ZHOU Z H. Ensemble methods: foundations and algorithms[M]. Beijing: Publishing House of Electronics Industry, 2020.
|
[10] |
ZHU Z, LIN K, ZHOU J. Transfer learning in deep reinfor-cement learning: a survey[J]. arXiv:2009.07888, 2021.
|
[11] |
LI S, GU F, ZHU G, et al. Context-aware policy reuse[C]// Proceedings of the 18th International Conference on Auto-nomous Agents and Multi-Agent Systems, Montreal, May 13-17, 2019. Richland: International Foundation for Autonomous Agents and Multiagent Systems, 2019: 989-997.
|
[12] |
RUSU A A, COLMENAREJO S G, GÜLÇEHRE C, et al. Policy distillation[J]. arXiv:1511.06295, 2015.
|
[13] |
BACON P L, HARB J, PRECUP D. The option-critic archi-tecture[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 1726-1734.
|
[14] |
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv:1503.02531, 2015.
|
[15] |
SCHMITT S, HUDSON J J, ZIDEK A, et al. Kickstarting deep reinforcement learning[J]. arXiv:1803.03835, 2018.
|
[16] |
BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI Gym[J]. arXiv:1606.01540, 2016.
|
[17] |
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1928-1937.
|