[1] DONG X Y, YAN T R, LV Y, et al. Multi-agent coordinated control and collision avoidance with unknown disturbances[J]. Transactions of Nanjing University of Aeronautics & Astronautics, 2022(2): 176-185.
[2] SONG Y, STEINWEG M, KAUFMANN E, et al. Autonomous drone racing with deep reinforcement learning[C]//Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, Sep 27-Oct 1, 2021. Piscataway: IEEE, 2021: 1205-1212.
[3] 杨庆玉. 基于深度强化学习的多智能体搬运调度方法研究[D]. 秦皇岛: 燕山大学, 2022.
YANG Q Y. Research on multiagent handling scheduling method based on deep reinforcement learning[D]. Qinhuangdao: Yanshan University, 2022.
[4] 黄天云, 陈雪波, 徐望宝, 等. 基于松散偏好规则的群体机器人系统自组织协作围捕[J]. 自动化学报, 2013, 39(1): 57-68.
HUANG T Y, CHEN X B, XU W B, et al. A self-organizing cooperative hunting by swarm robotic systems based on loose-preference rule[J]. Acta Automatica Sinica, 2013, 39(1): 57-68.
[5] 李瑞珍, 杨惠珍, 萧丛杉. 基于动态围捕点的多机器人协同策略[J]. 控制工程, 2019, 26(3): 510-514.
LI R Z, YANG H Z, XIAO C S. Multi-robot cooperative strategy based on dynamic trapping points[J]. Control Engineering of China, 2019, 26(3): 510-514.
[6] 蒋骁迪, 甘文洋. 一种新型多AUV水下围捕路径规划算法[J]. 计算机仿真, 2021, 38(9): 376-380.
JIANG X D, GAN W Y. A novel multi-AUV underwater trapping path planning algorithm[J]. Computer Simulation, 2021, 38(9): 376-380.
[7] 刘彦昊, 佘浩平, 蒙波, 等. 基于狼群优化的卫星集群对空间目标围捕方法[J/OL]. 北京航空航天大学学报 [2023-09-25]. https://doi.org/10.13700/j.bh.1001-5965.2022.0877.
LIU Y H, SHE H P, MENG B, et al. Satellite clustering method based on wolf pack optimization to capture space targets[J/OL]. Journal of Beijing University of Aeronautics and Astronautics [2023-09-25]. https://doi.org/10.13700/j.bh.1001-5965.2022.0877.
[8] SUTTON R S, BARTO A G, BACH F, et al. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998.
[9] LI J, PAN Q, HONG B. A new approach of multi-robot cooperative pursuit based on association rule data mining[J]. International Journal of Advanced Robotic Systems, 2010, 7(3): 1169-1174.
[10] LIU J, LIU S H, WU H Y, et al. A pursuit-evasion algorithm based on hierarchical reinforcement learning[C]//Proceedings of the 2009 International Conference on Measuring Technology and Mechatronics Automation, Zhangjiajie, Apr 11-12, 2009. Piscataway: IEEE, 2009: 482-486.
[11] MOSTAFA D, HOWARD M. A decentralized fuzzy learning algorithm for pursuit-evasion differential games with superior evaders[J]. Journal of Intelligent and Robotic Systems, 2016, 83(1): 35-53.
[12] LAUER M, RIEDMILLER M. An algorithm for distributed reinforcement learning incooperative multi-agent systems[C]//Proceedings of the 17th International Conference on Machine Learning, Stanford, Jun 29-Jul 2, 2000. New York: ACM, 2000: 535-542.
[13] LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 6379-6390.
[14] ZHOU X, ZHOU S, MOU X, et al. Multirobot collaborative pursuit target robot by improved MADDPG[EB/OL]. (2022-02-25) [2023-09-25]. https://www.hindawi.com/journals/cin/2022/4757394/.
[15] ZHANG Z, ZHOU B, LI G, et al. Dual-layer distributed optimal operation method for island microgrid based on adaptive consensus control and two stage MATD3 algorithm[J]. Journal of Marine Science and Engineering, 2023, 11(6): 1201.
[16] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park:AAAI, 2018: 2974-2982.
[17] 刘峰, 魏瑞, 丁超, 等. 面向多机协同的Att-MADDPG围捕控制方法设计[J]. 空军工程大学学报(自然科学版), 2021, 22(3): 9-14.
LIU F, WEI R, DING C, et al. Design of Att-MADDPG round up control method for multi-aircraft coordination[J]. Journal of Air Force Engineering University (Natural Science Edition), 2021, 22(3): 9-14.
[18] 王凤英, 陈莹, 袁帅, 等. 自注意力机制结合DDPG的机器人路径规划研究[J/OL]. 计算机工程与应用 [2023-10-10]. http://kns.cnki.net/kcms/detail/11.2127.TP.20230920.0937. 010.html.
WANG F Y, CHEN Y, YUAN S, et al. Research on robot path planning based on self-attention mechanism combined with DDPG[J/OL]. Computer Engineering and Applications [2023-10-10]. http://kns.cnki.net/kcms/detail/11.2127.TP.20230920.0937.010.html.
[19] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2023-09-25]. https://arxiv.org/abs/1511.05952.
[20] MA J C, LU H M, XIAO J H, et al. Multi-robot target encirclement control with collision avoidance via deep reinforcement learning[J]. Journal of Intelligent & Robotic Systems, 2020, 99: 371-386.
[21] 符小卫, 徐哲, 朱金冬, 等. 基于PER-MATD3的多无人机攻防对抗机动决策[J]. 航空学报, 2023, 44(7): 196-209.
FU X W, XU Z, ZHU J D, et al. Offensive and defensive adversarial maneuver decision of multi-UAV based on PER-MATD3[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(7): 196-209.
[22] 孙彧, 徐越, 潘宣宏, 等. 基于后验经验回放的MADDPG算法[J]. 指挥信息系统与技术, 2021, 12(6): 78-84.
SUN Y, XU Y, PAN X H, et al. MADDPG algorithm based on posterior experience playback[J]. Command Information System and Technology, 2021, 12(6): 78-84.
[23] 郭玥秀, 杨伟, 刘琦, 等. 残差网络研究综述[J]. 计算机应用研究, 2020, 37(5): 1292-1297.
GUO Y X, YANG W, LIU Q, et al. Review of residual network research[J]. Application Research of Computers, 2020, 37(5): 1292-1297.
[24] SUI D, XU W P, ZHANG K. Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics, 2022, 35(2): 195-213. |