Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (8): 2169-2179.DOI: 10.3778/j.issn.1673-9418.2307034
• Artificial Intelligence·Pattern Recognition • Previous Articles Next Articles
SHENG Lei, CHEN Xiliang, LAI Jun
Online:
2024-08-01
Published:
2024-07-29
盛蕾,陈希亮,赖俊
SHENG Lei, CHEN Xiliang, LAI Jun. Offline Multi-agent Reinforcement Learning Method Based on Latent State Distribution GPT[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2169-2179.
盛蕾, 陈希亮, 赖俊. 基于潜在状态分布GPT的离线多智能体强化学习方法[J]. 计算机科学与探索, 2024, 18(8): 2169-2179.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2307034
[1] DU X, WANG J, CHEN S, et al. Multi-agent deep reinforcement learning with spatio-temporal feature fusion for traffic signal control[C]//Proceedings of the 2021 European Conference on Machine Learning and Knowledge Discovery in Databases, Applied Data Science Track, Bilbao, Sep 13-17, 2021. Cham: Springer, 2021: 470-485. [2] LI M, QIN Z, JIAO Y, et al. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning[C]//Proceedings of the 2019 World Wide Web Conference, San Francisco, May 13-17, 2019. New York: ACM, 2019: 983-994. [3] ZHOU M, WAN Z, WANG H, et al. MALib: a parallel framework for population-based multi-agent reinforcement learning[J]. Journal of Machine Learning Research, 2023, 24. [4] SINGH B, KUMAR R, SINGH V P. Reinforcement learning in robotic applications: a comprehensive survey[J]. Artificial Intelligence Review, 2022, 55: 1-46. [5] SINGLA A, RAFFERTY A N, RADANOVIC G, et al. Reinforcement learning for education: opportunities and challenges[EB/OL]. [2023-05-23]. https://arxiv.org/abs/2107.08828. [6] LIU S, SEE K C, NGIAM K Y, et al. Reinforcement learning for clinical decision support in critical care: comprehensive review[J]. Journal of Medical Internet Research, 2020, 22(7): e18477. [7] KIRAN B R, SOBH I, TALPAERT V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 4909-4926. [8] FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2052-2062. [9] PENG X B, KUMAR A, ZHANG G, et al. Advantage-weighted regression simple and scalable off-policy reinforc-ement learning[EB/OL]. [2023-05-23]. https://arxiv.org/abs/1910.00177. [10] WU Y, TUCKER G, NACHUM O. Behavior regularized off-line reinforcement learning[EB/OL]. [2023-05-23]. https://arxiv.org/abs/1911.11361v1. [11] KUMAR A, ZHOU A, TUCKER G, et al. Conservative Q-learning for offline reinforcement learning[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1179-1191. [12] WU Y, ZHAI S, SRIVASTAVA N, et al. Uncertainty weighted actor-critic for offline reinforcement learning[C]//Proceedings of the 38th International Conference on Machine Lear-ning, Jul 18-24, 2021: 11319-11328. [13] YANG Y, MA X, LI C, et al. Believe what you see: implicit constraint approach for offline multi-agent reinforcement learning[C]//Advances in Neural Information Processing Syst-ems 34, Dec 6-14, 2021: 10299-10312. [14] WEN M, KUBA J, LIN R, et al. Multi-agent reinforcement learning is a sequence modeling problem[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 16509-16521. [15] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1877-1901. [16] CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 1691-1703. [17] LU K, GROVER A, ABBEEL P, et al. Pretrained transformers as universal computation engines[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence, the 34th Conference on Innovative Applications of Artificial Intelligence, the 12th Symposium on Educational Advances in Artificial Intelligence, Feb 22-Mar 1, 2022: 7628-7636. [18] FURUTA H, MATSUO Y, GU S S. Generalized decision transformer for offline hindsight information matching[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022. [19] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[C]//Proceedings of the 2nd International Conference on Natural Language Processing and Information Retrieval, Bangkok, Sep 7-9, 2018. New York: ACM, 2018: 6-10. [20] KENTON J D M W C, TOUTANOVA L K. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186. [21] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 10012-10022. [22] ZHAI X, KOLESNIKOV A, HOULSBY N, et al. Scaling vision transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 19-21, 2022. Piscataway: IEEE, 2022: 12104-12113. [23] PARISOTTO E, SONG F, RAE J, et al. Stabilizing transformers for reinforcement learning[C]//Proceedings of the 37th International conference on Machine Learning, Jul 13-18, 2020: 7487-7498. [24] CHEN L, LU K, RAJESWARAN A, et al. Decision transformer: reinforcement learning via sequence modeling[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 15084-15097. [25] JANNER M, LI Q, LEVINE S. Offline reinforcement learning as one big sequence modeling problem[C]//Advances in Neural Information Processing Systems 34, Dec 6-14, 2021: 1273-1286. [26] DASARI S, GUPTA A. Transformers for one-shot visual imitation[C]//Proceedings of the 2021 Conference on Robot Learning, London, Nov 8-11, 2021: 2071-2084. [27] ZHANG K, YANG Z, LIU H, et al. Finite-sample analysis for decentralized batch multiagent reinforcement learning with networked agents[J]. IEEE Transactions on Automatic Control, 2021, 66(12): 5925-5940. [28] PAN L, HUANG L, MA T, et al. Plan better amid conservatism: offline multi-agent reinforcement learning with actor rectification[C]//Proceedings of the 2022 International Conference on Machine Learning, Maryland, Jul 17-23, 2022: 17221-17237. [29] MENG L, WEN M, LE C, et al. Offline pre-trained multi-agent decision transformer[J]. Machine Intelligence Research, 2023, 20(2): 233-248. [30] TSENG W C, WANG T H J, LIN Y C, et al. Offline multi-agent reinforcement learning with knowledge distillation[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 226-237. [31] ABEL D, HERSHKOWITZ D, LITTMAN M. Near optimal behavior via approximate state abstraction[C]//Proceedings of the 2016 International Conference on Machine Lear-ning, New York, Jun 19-24, 2016: 2915-2923. [32] NACHUM O, GU S, LEE H, et al. Near-optimal representation learning for hierarchical reinforcement learning[C]//Proceedings of the 2018 International Conference on Learning Representations, Vancouver, Apr 30-May 3, 2018: 1-7. [33] HAFNER D, LILLICRAP T P, NOROUZI M, et al. Mastering atari with discrete world models[C]//Proceedings of the 2020 International Conference on Learning Representations, Apr 27-30, 2020: 7-15. [34] LEVINE N, CHOW Y, SHU R, et al. Prediction, consistency, curvature: representation learning for locally-linear control[EB/OL]. [2023-05-23]. https://arxiv.org/abs/1909.01506. [35] YANG M, NACHUM O. Representation matters: offline pretraining for sequential decision making[C]//Proceedings of the 2021 International Conference on Machine Learning, Eindhoven, Aug 1-5, 2021: 11784-11794. [36] STOOKE A, LEE K, ABBEEL P, et al. Decoupling representation learning from reinforcement learning[C]//Proceedings of the 2021 International Conference on Machine Learning, Eindhoven, Aug 1-5, 2021: 9870-9879. [37] KUMAR A, HONG J, SINGH A, et al. Should I run offline reinforcement learning or behavioral cloning?[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022: 15-51. |
[1] | XIANG Xiaowei, SHEN Yanguang, HU Minghao, YAN Tianwei, LUO Wei, LUO Zhunchen. Research on Science and Technology Policy and Regulation Q&A System Driven by Large Models [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2349-2360. |
[2] | LI Yifei, ZHANG Lingling, DONG Yuxuan, WANG Jiaxin, ZHONG Yujie, WEI Bifan. Large Language Model Augmentation and Feature Alignment Method for Few-Shot Continual Relation Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2326-2336. |
[3] | JI Guiyang, WANG Peiyan, YU Zhuo. Research on Knowledge Injection Method for Large Language Model Oriented to Process Specification Texts [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2361-2369. |
[4] | CHEN Longfei, GAO Xin, HOU Haotian, YE Chuyang, LIU Ya'ou, ZHANG Meihui. Application of Generative Large Language Models in Chinese Radiology Domain [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2337-2348. |
[5] | LUO Shijie, JIN Rize, HAN Shuzhen. Research on University Basic Knowledge Question-Answering Using Low-Rank Encoding to Optimize Large Language Model [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2156-2168. |
[6] | WANG Yonggui, CHEN Shuming, LIU Yihai, LAI Zhenxiang. Knowledge-aware Recommendation Algorithm Combining Hypergraph Contrast Learning and Relational Clustering [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(8): 2140-2155. |
[7] | ZHANG Qi, ZHONG Hao. Submodular Optimization Approach for Entity Summarization in Knowledge Graph Driven by Large Language Models [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1806-1813. |
[8] | FENG Jun, CHANG Yanghong, LU Jiamin, TANG Hailin, LYU Zhipeng, QIU Yuchun. Construction and Application of Knowledge Graph for Water Engineering Scheduling Based on Large Language Model [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1637-1647. |
[9] | LIN Sui, LU Chaohai, JIANG Wenchao, LIN Xiaoshan, ZHOU Weilin. Few-Shot Knowledge Graph Completion Based on Selective Attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 646-658. |
[10] | YAN Zhaoyao, DING Cangfeng, MA Lerong, CAO Lu, YOU Hao. Advances in Knowledge Graph Embedding Based on Graph Neural Networks [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(8): 1793-1813. |
[11] | WANG Xuecen, ZHANG Yu, ZHAO Changkuan, CHEN Mo, YU Ge. Evaluation for Instructional Interaction Using Bipartite Network Representation Learning [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(6): 1463-1472. |
[12] | ZHAO Min, ZHANG Yueqin, DOU Yingtong, ZHANG Zehua. Imbalanced Fake Reviews?Detection with Ensemble Hierarchical Graph Attention Network [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 428-441. |
[13] | FU Kun, ZHUO Jiaming, GUO Yunpeng, LI Jianing, LIU Qi. Graph Convolutional Network with Adaptive Fusion of Neighborhood Aggregation and Interaction [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 453-466. |
[14] | ZHANG Heyi, WANG Xin, HAN Lifan, LI Zhao, CHEN Zirui, CHEN Zhe. Research on Question Answering System on Joint of Knowledge Graph and Large Language Models [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(10): 2377-2388. |
[15] | PENG Huang, ZENG Weixin, ZHOU Jie, TANG Jiuyang, ZHAO Xiang. Contrast Research of Representation Learning in Entity Alignment Based on Graph Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(10): 2343-2357. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/