[1] DU X, WANG J, CHEN S, et al. Multi-agent deep reinforcement learning with spatio-temporal feature fusion for traffic signal control[C]//Proceedings of the 2021 European Conference on Machine Learning and Knowledge Discovery in Databases, Applied Data Science Track, Bilbao, Sep 13-17, 2021. Cham: Springer, 2021: 470-485.
[2] LI M, QIN Z, JIAO Y, et al. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning[C]//Proceedings of the 2019 World Wide Web Conference, San Francisco, May 13-17, 2019. New York: ACM, 2019: 983-994.
[3] ZHOU M, WAN Z, WANG H, et al. MALib: a parallel framework for population-based multi-agent reinforcement learning[J]. Journal of Machine Learning Research, 2023, 24.
[4] SINGH B, KUMAR R, SINGH V P. Reinforcement learning in robotic applications: a comprehensive survey[J]. Artificial Intelligence Review, 2022, 55: 1-46.
[5] SINGLA A, RAFFERTY A N, RADANOVIC G, et al. Reinforcement learning for education: opportunities and challenges[EB/OL]. [2023-05-23]. https://arxiv.org/abs/2107.08828.
[6] LIU S, SEE K C, NGIAM K Y, et al. Reinforcement learning for clinical decision support in critical care: comprehensive review[J]. Journal of Medical Internet Research, 2020, 22(7): e18477.
[7] KIRAN B R, SOBH I, TALPAERT V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 4909-4926.
[8] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2052-2062.
[9] PENG X B, KUMAR A, ZHANG G, et al. Advantage-weighted regression simple and scalable off-policy reinforc-ement learning[EB/OL]. [2023-05-23]. https://arxiv.org/abs/1910.00177.
[10] WU Y, TUCKER G, NACHUM O. Behavior regularized off-line reinforcement learning[EB/OL]. [2023-05-23]. https://arxiv.org/abs/1911.11361v1.
[11] KUMAR A, ZHOU A, TUCKER G, et al. Conservative Q-learning for offline reinforcement learning[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1179-1191.
[12] WU Y, ZHAI S, SRIVASTAVA N, et al. Uncertainty weighted actor-critic for offline reinforcement learning[C]//Proceedings of the 38th International Conference on Machine Lear-ning, Jul 18-24, 2021: 11319-11328.
[13] YANG Y, MA X, LI C, et al. Believe what you see: implicit constraint approach for offline multi-agent reinforcement learning[C]//Advances in Neural Information Processing Syst-ems 34, Dec 6-14, 2021: 10299-10312.
[14] WEN M, KUBA J, LIN R, et al. Multi-agent reinforcement learning is a sequence modeling problem[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 16509-16521.
[15] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1877-1901.
[16] CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 1691-1703.
[17] LU K, GROVER A, ABBEEL P, et al. Pretrained transformers as universal computation engines[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence, the 34th Conference on Innovative Applications of Artificial Intelligence, the 12th Symposium on Educational Advances in Artificial Intelligence, Feb 22-Mar 1, 2022: 7628-7636.
[18] FURUTA H, MATSUO Y, GU S S. Generalized decision transformer for offline hindsight information matching[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022.
[19] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[C]//Proceedings of the 2nd International Conference on Natural Language Processing and Information Retrieval, Bangkok, Sep 7-9, 2018. New York: ACM, 2018: 6-10.
[20] KENTON J D M W C, TOUTANOVA L K. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[21] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 10012-10022.
[22] ZHAI X, KOLESNIKOV A, HOULSBY N, et al. Scaling vision transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 19-21, 2022. Piscataway: IEEE, 2022: 12104-12113.
[23] PARISOTTO E, SONG F, RAE J, et al. Stabilizing transformers for reinforcement learning[C]//Proceedings of the 37th International conference on Machine Learning, Jul 13-18, 2020: 7487-7498.
[24] CHEN L, LU K, RAJESWARAN A, et al. Decision transformer: reinforcement learning via sequence modeling[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 15084-15097.
[25] JANNER M, LI Q, LEVINE S. Offline reinforcement learning as one big sequence modeling problem[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 1273-1286.
[26] DASARI S, GUPTA A. Transformers for one-shot visual imitation[C]//Proceedings of the 2021 Conference on Robot Learning, London, Nov 8-11, 2021: 2071-2084.
[27] ZHANG K, YANG Z, LIU H, et al. Finite-sample analysis for decentralized batch multiagent reinforcement learning with networked agents[J]. IEEE Transactions on Automatic Control, 2021, 66(12): 5925-5940.
[28] PAN L, HUANG L, MA T, et al. Plan better amid conservatism: offline multi-agent reinforcement learning with actor rectification[C]//Proceedings of the 2022 International Conference on Machine Learning, Maryland, Jul 17-23, 2022: 17221-17237.
[29] MENG L, WEN M, LE C, et al. Offline pre-trained multi-agent decision transformer[J]. Machine Intelligence Research, 2023, 20(2): 233-248.
[30] TSENG W C, WANG T H J, LIN Y C, et al. Offline multi-agent reinforcement learning with knowledge distillation[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 226-237.
[31] ABEL D, HERSHKOWITZ D, LITTMAN M. Near optimal behavior via approximate state abstraction[C]//Proceedings of the 2016 International Conference on Machine Lear-ning, New York, Jun 19-24, 2016: 2915-2923.
[32] NACHUM O, GU S, LEE H, et al. Near-optimal representation learning for hierarchical reinforcement learning[C]//Proceedings of the 2018 International Conference on Learning Representations, Vancouver, Apr 30-May 3, 2018: 1-7.
[33] HAFNER D, LILLICRAP T P, NOROUZI M, et al. Mastering atari with discrete world models[C]//Proceedings of the 2020 International Conference on Learning Representations, Apr 27-30, 2020: 7-15.
[34] LEVINE N, CHOW Y, SHU R, et al. Prediction, consistency, curvature: representation learning for locally-linear control[EB/OL]. [2023-05-23]. https://arxiv.org/abs/1909.01506.
[35] YANG M, NACHUM O. Representation matters: offline pretraining for sequential decision making[C]//Proceedings of the 2021 International Conference on Machine Learning, Eindhoven, Aug 1-5, 2021: 11784-11794.
[36] STOOKE A, LEE K, ABBEEL P, et al. Decoupling representation learning from reinforcement learning[C]//Proceedings of the 2021 International Conference on Machine Learning, Eindhoven, Aug 1-5, 2021: 9870-9879.
[37] KUMAR A, HONG J, SINGH A, et al. Should I run offline reinforcement learning or behavioral cloning?[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022: 15-51. |