[1] GUPTA J K, EGOROV M, KOCHENDERFER M. Cooperative multi-agent control using deep reinforcement learning[C]//Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops. Cham: Springer, 2017: 66-83.
[2] NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
[3] 徐诚, 殷楠, 段世红, 等. 基于奖励滤波信用分配的多智能体深度强化学习算法[J]. 计算机学报, 2022, 45(11): 2306-2320.
XU C, YIN N, DUAN S H, et al. Reward-filtering-based credit assignment for multi-agent deep reinforcement learning[J]. Chinese Journal of Computers, 2022, 45(11): 2306-2320.
[4] CAO Y C, YU W W, REN W, et al. An overview of recent progress in the study of distributed multi-agent coordination[J]. IEEE Transactions on Industrial Informatics, 2013, 9(1): 427-438.
[5] CUI J J, LIU Y W, NALLANATHAN A. Multi-agent reinforcement learning-based resource allocation for UAV networks[J]. IEEE Transactions on Wireless Communications, 2020, 19(2): 729-743.
[6] LIU X L, YU J D, FENG Z Y, et al. Multi-agent reinforcement learning for resource allocation in IoT networks with edge computing[J]. China Communications, 2020, 17(9): 220-236.
[7] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842-1858.
LI J C, SHI H B, HUANG G S. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese Journal of Computers, 2022, 45(9): 1842-1858.
[8] SCH?LLIG A, ALONSO-MORA J, D’ANDREA R. Independent vs. joint estimation in multi-agent iterative learning control[C]//Proceedings of the 49th IEEE Conference on Decision and Control. Piscataway: IEEE, 2010: 6949-6954.
[9] POSOR J E, BELZNER L, KNAPP A. Joint action learning for multi-agent cooperation using recurrent reinforcement learning[J]. Digitale Welt, 2020, 4(1): 79-84.
[10] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2974-2982.
[11] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2024-07-15]. https://arxiv.org/abs/1706.02275.
[12] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[EB/OL]. [2024-07-15]. https://arxiv.org/abs/1706.05296.
[13] RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. [2024-07-15]. https://arxiv.org/abs/2003.08839.
[14] WANG J H, REN Z Z, LIU T, et al. QPLEX: duplex dueling multi-agent Q-learning[EB/OL]. [2024-07-15]. https://arxiv.org/abs/2008.01062.
[15] SAMVELYAN M, RASHID T, DE WITT C S, et al. The StarCraft multi-agent challenge[EB/OL]. [2024-07-15]. https://arxiv.org/abs/1902.04043.
[16] SU J Y, ADAMS S, BELING P. Value-decomposition multi-agent actor-critics[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(13): 11352-11360.
[17] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. [2024-07-18]. https://arxiv.org/abs/2103.01955.
[18] DING Z, HUANG T, LU Z. Learning individually inferred communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems 33, 2020: 22069-22079.
[19] DAS A, GERVET T, ROMOFF J, et al. TarMAC: targeted multi-agent communication[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1810.11187.
[20] SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1812.09755.
[21] YUAN L, WANG J H, ZHANG F X, et al. Multi-agent incentive communication via decentralized teammate modeling[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(9): 9466-9474.
[22] WANG T H, WANG J H, ZHENG C Y, et al. Learning nearly decomposable value functions via communication minimization[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1910.05366.
[23] XU D, CHEN G. The research on intelligent cooperative combat of UAV cluster with multi-agent reinforcement learning[J]. Aerospace Systems, 2022, 5(1): 107-121.
[24] SUTTON R S. Temporal credit assignment in reinforcement learning[D]. Amherst: University of Massachusetts Amherst, 1984.
[25] TAMPUU A, MATIISEN T, KODELJA D, et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PLoS One, 2017, 12(4): e0172395.
[26] SON K, KIM D, KANG W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5887-5896.
[27] MAHAJAN A, RASHID T, SAMVELYAN M, et al. MAVEN: multi-agent variational exploration[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1910.07483.
[28] YANG Y D, HAO J Y, LIAO B, et al. Qatten: a general framework for cooperative multiagent reinforcement learning[EB/OL]. [2024-07-18]. https://arxiv.org/abs/2002.03939.
[29] DA SILVA F L, HERNANDEZ-LEAL P, KARTAL B, et al. Uncertainty-aware action advising for deep reinforcement learning agents[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 5792-5799.
[30] LüTJENS B, EVERETT M, HOW J P. Safe reinforcement learning with model uncertainty estimates[C]//Proceedings of the 2019 International Conference on Robotics and Automation. Piscataway: IEEE, 2019: 8662-8668.
[31] WANG Y, ZOU S. Online robust reinforcement learning with model uncertainty[C]//Advances in Neural Information Processing Systems 34, 2021: 7193-7206.
[32] ZHANG K, SUN T, TAO Y, et al. Robust multi-agent reinforcement learning with model uncertainty[C]//Advances in Neural Information Processing Systems 33, 2020: 10571-10583.
[33] GAO X, LI X Y, LIU Q, et al. Multi-agent decision-making modes in uncertain interactive traffic scenarios via graph convolution-based deep reinforcement learning[J]. Sensors, 2022, 22(12): 4586.
[34] TANG B H, ZHONG Y Q, XU C X, et al. Collaborative uncertainty benefits multi-agent multi-modal trajectory forecasting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13297-13313.
[35] SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1605.07736.
[36] MAO H Y, ZHANG Z C, XIAO Z, et al. Learning multi-agent communication with double attentional deep reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems, 2020, 34(1): 32.
[37] OLIEHOEK F A, AMATO C. A concise introduction to decentralized POMDPs[M]. Cham: Springer, 2016.
[38] WANG J, ZHANG Y, GU Y, et al. SHAQ: incorporating Shapley value theory into multi-agent Q-learning[C]//Advances in Neural Information Processing Systems 35, 2022: 5941-5954. |