[1] ARULKUMARAN K, MARC D, MILES B, et al. Deep reinforcement learning: a brief survey[J].?IEEE Signal Pro-cessing Magazine,?2016, 34(6): 26-38.
[2] KOBER J, BAGNELL J A, PETERS J. Reinforcement lear-ning in robotics: a survey[J]. The International Journal of Robotics Research, 2013, 32(11): 1238-1274.
[3] SUTTON R S, BARTO A G. Reinforcement learning: an in-troduction[M]. Cambridge: MIT Press, 2018.
[4] LEVINE S, KUMAR A, TUCKER G, et al. Offline reinfor-cement learning: tutorial, review, and perspectives on open problems[J]. arXiv:2005.01643, 2020.
[5] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep rein-forcement learning without exploration[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019. Cambridge: JMLR, 2019: 2052-2062.
[6] KUMAR A, ZHOU A, TUCKER G, et al. Conservative Q-learning for offline reinforcement learning[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1179-1191.
[7] ERNST D, GEURTS P, WEHENKEL L. Tree-based batch mode reinforcement learning[J]. Journal of Machine Learning Research, 2005, 6: 503-556.
[8] WU Y, TUCKER G, NACHUM O. Behavior regularized offline reinforcement learning[J]. arXiv:1911.11361, 2019.
[9] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-lear-ning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017. Cambridge: JMLR, 2017: 1126-1135.
[10] GUPTA A, MENDONCA R, LIU Y, et al. Meta-reinforcement learning of structured exploration strategies[C]//Advances in Neural Information Processing Systems, 31, Montréal, Dec 3-8, 2018. New York: Curran Associates, 2018: 5302-5311.
[11] ROTHFUSS J, LEE D, CLAVERA I, et al. ProMP: proximal meta-policy search[C]//Proceedings of the 2019 International Conference on Learning Representations, New Orleans, May 6-9, 2019: 1-25.
[12] RAKELLY K, ZHOU A, FINN C, et al. Efficient off-policy meta-reinforcement learning via probabilistic context varia-bles[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 5331-5340.
[13] LI J, VUONG Q, LIU S, et al. Multi-task batch reinforcement learning with metric learning[C]//Advances in Neural Infor-mation Processing Systems 33, Dec 6-12, 2020. New York: Curran Associates, 2020: 6197-6210.
[14] LI L, YANG R, LUO D. FOCAL: efficient fully-offline meta-reinforcement learning via distance metric learning and behavior regularization[C]//Proceedings of the 9th International Conference on Learning Representations, May 3-7, 2021: 1-11.
[15] MITCHELL E, RAFAILOV R, PENG X B, et al. Offline meta-reinforcement learning with advantage weighting[C]//Proceedings of the 38th International Conference on Ma-chine Learning, Jul 18-24, 2021: 7780-7791.
[16] PENG X B, KUMAR A, ZHANG G, et al. Advantage-weighted regression: simple and scalable off-policy reinforcement lear-ning[J]. arXiv:1910.00177, 2019.
[17] OORD A V, LI Y, VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv:1807.03748, 2018.
[18] HE K, FAN H, WU Y, et al. Momentum contrast for unsu-pervised visual representation learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 9729-9738.
[19] LASKIN M, SRINIVAS A, ABBEEL P. CURL: contrastive unsupervised representations for reinforcement learning[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 12-18, 2020: 5639-5650.
[20] FU H, TANG H, HAO J, et al. Towards effective context for meta-reinforcement learning: an approach based on contras-tive learning[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, the 33rd Conference on Innovative Applications of Artificial Intelligence, the 11th Symposium on Educational Advances in Artificial Intelligence, Feb 2-9, 2021. Palo Alto: AAAI Press, 2021: 7457-7465.
[21] FUJIMOTO S, GU S S. A minimalist approach to offline reinforcement learning[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 20132-20145.
[22] LI L, HUANG Y, CHEN M, et al. Provably improved con-text-based offline meta-RL with attention and contrastive learning[J]. arXiv:2102.10774, 2021.
[23] FAKOOR R, CHAUDHARI P, SOATTO S, et al. Meta-Q-Learning[C]//Proceedings of the 2020 International Conference on Learning Representations, Apr 26-May 1, 2020: 1-17.
[24] ZHOU W, PINTO L, GUPTA A. Environment probing inter-action policies[C]//Proceedings of the 2019 International Con-ference on Learning Representations, New Orleans, May 6-9, 2019: 1-13.
[25] LEE K, SEO Y, LEE S, et al. Context-aware dynamics model for generalization in model-based reinforcement learning[C]//Proceedings of the 37th International Conference on Machine Learning, Jul 12-18, 2020: 5757-5766.
[26] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv:1412.3555, 2014.
[27] KOSTRIKOV I, FERGUS R, TOMPSON J, et al. Offline reinforcement learning with fisher divergence critic regula-rization[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 5774-5783.
[28] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsm?ssan, Jul 10-15, 2018: 1582-1591.
[29] TODOROV E, EREZ T, TASSA Y. MuJoCo: a physics engine for model-based control[C]//Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Oct 7-11, 2012. Piscataway: IEEE, 2012: 5026-5033.
[30] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th Interna-tional Conference on Machine Learning, Stockholmsm?ssan, Jul 10-15, 2018: 1861-1870. |