[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[2] 王扬, 陈智斌, 吴兆蕊, 等. 强化学习求解组合最优化问题的研究综述[J]. 计算机科学与探索, 2022, 16(2): 261-279.
WANG Y, CHEN Z B, WU Z R, et al. Review of reinforcement learning for combinatorial optimization problem[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 261-279.
[3] KUNG T H, CHEATHAM M, MEDENILLA A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models[J]. PLoS Digital Health, 2023, 2(2): e0000198.
[4] BOMMARITO M J, KATZ D M. GPT takes the bar exam [J]. arXiv:2212.14402, 2022.
[5] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[J]. arXiv:2203.02155, 2022.
[6] HU Y, WANG W, JIA H, et al. Learning to utilize shaping rewards: a new approach of reward shaping[J]. arXiv:2011.02669, 2020.
[7] SOVIANY P, IONESCU R T, ROTA P, et al. Curriculum learning: a survey[J]. International Journal of Computer Vision, 2021, 130: 1526-1565.
[8] NAM T, SUN S H, PERTSCH K, et al. Skill-based meta-reinforcement learning[J]. arXiv:2204.11828, 2022.
[9] 韩旭, 吴锋. 结合对比预测的离线元强化学习方法[J]. 计算机科学与探索, 2023, 17(8): 1917-1927.
HAN X, WU F. Offline meta-reinforcement learning with contrastive prediction[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(8): 1917-1927.
[10] ZHOU F, CAO C. Overcoming catastrophic forgetting in graph neural networks with experience replay[C]//Proceedings of the 2021 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2021: 4714-4722.
[11] SAGLAM B, MUTLU F B, CICEK D C, et al. Actor prioritized experience replay[J]. arXiv:2209.00532, 2022.
[12] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5055-5065.
[13] HE Q, ZHUANG L, LI H. Soft hindsight experience replay [J]. arXiv:2002.02089, 2020.
[14] LIU H, TROTT A, SOCHER R, et al. Competitive experience replay[J]. arXiv:1902.00528, 2019.
[15] NGUYEN H, LA H M, DEANS M C. Hindsight experience replay with experience ranking[C]//Proceedings of the 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics. Piscataway: IEEE, 2019: 1-6.
[16] FANG M, ZHOU T, DU Y, et al. Curriculum-guided hindsight experience replay[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 12602-12613.
[17] SCHRAMM L, DENG Y, GRANADOS E, et al. USHER: unbiased sampling for hindsight experience replay[J]. arXiv:2207.01115, 2022.
[18] LUU T M, YOO C D. Hindsight goal ranking on replay buffer for sparse reward environment[J]. IEEE Access, 2021, 9: 51996-52007.
[19] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proceedings of the 4th International Conference on Learning Representations, San Juan, May 2-4, 2016.
[20] ZHANG J, HE T, SRA S, et al. Why gradient clipping accelerates training: a theoretical justification for adaptivity[C]// Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020.
[21] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 488-489.
[22] LEVINE S, KUMAR A, TUCKER G, et al. Offline reinforcement learning: tutorial, review, and perspectives on open problems[J]. arXiv:2005.01643, 2020. |