计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (6): 1457-1475.DOI: 10.3778/j.issn.1673-9418.2312006
夏庆锋, 许可儿, 李明阳, 胡凯, 宋利鹏, 宋志强, 孙宁
出版日期:
2024-06-01
发布日期:
2024-05-31
XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning
Online:
2024-06-01
Published:
2024-05-31
摘要: 近年来,强化学习与注意力机制的结合在算法研究领域备受瞩目。在强化学习算法中,注意力机制的应用在提高算法性能方面发挥了重要作用。重点聚焦于注意力机制在深度强化学习中的发展,审视了其在多智能体强化学习领域的应用,并对相关研究成果进行调研。首先,介绍了注意力机制和强化学习的研究背景与发展历程,并调研了该领域中的相关实验平台;然后,回顾了强化学习与注意力机制的经典算法,并从不同角度对注意力机制进行分类;接着,对注意力机制在强化学习领域的应用进行了梳理,根据三种任务类型(完全合作型、完全竞争型和混合合作竞争型)进行分类分析,重点关注了多智能体领域的应用情况;最后,总结了注意力机制对强化学习算法的改进作用,并展望了该领域所面临的挑战和未来的研究前景。
夏庆锋, 许可儿, 李明阳, 胡凯, 宋利鹏, 宋志强, 孙宁. 强化学习中的注意力机制研究综述[J]. 计算机科学与探索, 2024, 18(6): 1457-1475.
XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning. Review of Attention Mechanisms in Reinforcement Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1457-1475.
[1] 羊波, 王琨, 马祥祥, 等. 多智能体强化学习的机械臂运动控制决策研究[J]. 计算机工程与应用, 2023, 59(6): 318-325. YANG B, WANG K, MA X X, et al. Research on motion control method of manipulator based on reinforcement learning[J]. Computer Engineering and Applications, 2023, 59(6): 318-325. [2] MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 2204-2212. [3] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2961-2970. [4] WANG W, QIU Y, XUAN S, et al. Early rumor detection based on deep recurrent Q-learning[J]. Security and Communication Networks, 2021: 5569064. [5] MATIGNON L, LAURENT G J, LE FORT-PIAT N. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31. [6] OROOJLOOY A, HAXIEZHAD D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722. [7] BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156-172. [8] WANG J, HONG Y, WANG J, et al. Cooperative and competitive multi-agent systems: from optimization to games[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(5): 763-783. [9] CHANG W, LIZHEN W U, CHAO Y A N, et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork[J]. Chinese Journal of Aeronautics, 2020, 33(11): 2930-2945. [10] PONNIAH J, DANTSKER O D. Strategies for scaleable communication and coordination in multi-agent (UAV) systems[J]. Aerospace, 2022, 9(9): 488. [11] NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839. [12] WANG Y, YANG X, LIANG H, et al. A review of the self-adaptive traffic signal control system based on future traffic environment[J]. Journal of Advanced Transportation, 2018(3): 1-12. [13] ARWA E O, FOLLY K A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: a comprehensive review[J]. IEEE Access, 2020, 8: 208992-209007. [14] ZHANG K, YANG Z, BA?AR T. Decentralized multi-agent reinforcement learning with networked agents: recent advances[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(6): 802-814. [15] 胡凯, 郑翡, 卢飞宇, 等. 基于深度学习的行为识别算法综述[J]. 南京信息工程大学学报(自然科学版), 2021, 13(6): 730-743. HU K, ZHENG F, LU F Y, et al. A review of behavior recognition algorithms based on deep learning[J]. Journal of Nanjing University of Information Engineering (Natural Science Edition), 2021, 13(6): 730-743. [16] RUPPRECHT T, WANG Y. A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions[J]. Neural Networks, 2022, 153: 13-36. [17] BANIK S, LOEFFLER T D, BATRA R, et al. Learning with delayed rewards—a case study on inverse defect design in 2D materials[J]. ACS Applied Materials & Interfaces, 2021, 13(30): 36455-36464. [18] SHARMA R, GOPAL M. A Markov game-adaptive fuzzy controller for robot manipulators[J]. IEEE Transactions on Fuzzy Systems, 2008, 16(1): 171-186. [19] HWANG K S, TAN S W, CHEN C C. Cooperative strategy based on adaptive Q-learning for robot soccer systems[J]. IEEE Transactions on Fuzzy Systems, 2004, 12(4): 569-576. [20] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1995-2003. [21] 韩润海, 陈浩, 刘权, 等. 基于对手动作预测的智能博弈对抗算法[J]. 计算机工程与应用, 2023, 59(7): 190-197. HAN R H, CHEN H, LIU Q, et al. Intelligent game countermeasures algorithm based on opponent action prediction[J]. Computer Engineering and Applications, 2023, 59(7): 190-197. [22] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 3215-3222. [23] HAN M, MAY R, ZHANG X, et al. A review of reinforcement learning methodologies for controlling occupant comfort in buildings[J]. Sustainable Cities and Society, 2019, 51: 101748. [24] GUPTA S, SINGAL G, GARG D. Deep reinforcement learning techniques in diversified domains: a survey[J]. Archives of Computational Methods in Engineering, 2021, 28(7): 4715-4754. [25] TANG C Y, LIU C H, CHEN W K, et al. Implementing action mask in proximal policy optimization (PPO) algorithm[J]. ICT Express, 2020, 6(3): 200-203. [26] ZHOU C, HUANG B, FR?NTI P. A review of motion planning algorithms for intelligent robots[J]. Journal of Intelligent Manufacturing, 2022, 33(2): 387-424. [27] 汪晨曦, 赵学艳, 郭新. 基于权重值的竞争深度双Q网络算法[J]. 南京信息工程大学学报(自然科学版), 2021, 13(5): 564-570. WANG C X, ZHAO X Y, GUO X. Competitive deep dual Q-network algorithm based on weight values[J]. Journal of Nanjing University of Information Engineering (Natural Science Edition), 2021, 13(5): 564-570. [28] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. ZHU Z L, RAO Y, WU Y, et al. Research progress of attention mechanism in deep learning[J]. Journal of Chinese Information, 2019, 33(6): 1-11. [29] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368. [30] KWAK D, CHOI S, CHANG W. Self-attention based deep direct recurrent reinforcement learning with hybrid loss for trading signal generation[J]. Information Sciences, 2023, 623: 592-606. [31] YAN L, ZHU L, SONG K, et al. Graph cooperation deep reinforcement learning for ecological urban traffic signal control[J]. Applied Intelligence, 2023, 53(6): 6248-6265. [32] ZHAO J, ZHU T, XIAO S, et al. Actor-critic for multi-agent reinforcement learning with self-attention[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2022, 36(9): 1-16. [33] 杨飞扬, 崔荣一, 赵亚慧, 等. 基于强化学习与自注意力机制的朝鲜语重要句子结构识别[J]. 中文信息学报, 2021, 35(9): 9. YANG F Y, CUI R Y, ZHAO Y H, et al. Recognizing important sentence structures in Korean based on reinforcement learning and self-attention mechanism[J]. Journal of Chinese Information, 2021, 35(9): 9. [34] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842-1858. LI J C, SHI H B, HWANG K S. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese Journal of Computers, 2022, 45(9):1842-1858. [35] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [36] LIU K, ZHAO Y, WANG G, et al. Self-attention-based multi-agent continuous control method in cooperative environments[J]. Information Sciences, 2022, 585: 454-470. [37] LIU J, WANG G, FU Q, et al. Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning[J]. Defence Technology, 2023, 19: 210-219. [38] 陈栋, 李明, 李莉, 等. 基于双池化注意力机制的高光谱图像分类算法[J]. 南京信息工程大学学报(自然科学版), 2023, 15(4): 393-402. CHEN D, LI M, LI L, et al. Hyperspectral image classification algorithm based on dual pooling attention mechanism[J]. Journal of Nanjing University of Information Science and Technology (Natural Science Edition), 2023, 15(4): 393-402. [39] YAN C, WANG C, XIANG X, et al. Collision-avoiding flocking with multiple fixed-wing UAVs in obstacle-cluttered environments: a task-specific curriculum-based MADRL approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023: 3245124. [40] WANG Y, SHI D, XUE C, et al. AHAC: actor hierarchical attention critic for multi-agent reinforcement learning[C]//Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2020: 3013-3020. [41] WANG W, ZHANG Y, SUI Y, et al. reinforcement-learning-guided source code summarization using hierarchical attention[J]. IEEE Transactions on Software Engineering, 2020, 48(1): 102-119. [42] YANG L, WANG X, DAI Y, et al. HackRL: reinforcement learning with hierarchical attention for cross-graph knowledge fusion and collaborative reasoning[J]. Knowledge-Based Systems, 2021, 233: 107498. [43] XIE Z, HU L, HUANG Y, et al. A semiopportunistic task allocation framework for mobile crowdsensing with deep learning[J]. Wireless Communications and Mobile Computing, 2021: 6643229. [44] REN Y, YE Z, SONG G, et al. Space-air-ground integrated mobile crowdsensing for partially observable data collection by multi-scale convolutional graph reinforcement learning[J]. Entropy, 2022, 24(5): 638. [45] YAN Y, ZHANG B, LI C, et al. Cooperative caching and fetching in D2D communications—a fully decentralized multi-agent reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2020, 69(12): 16095-16109. [46] YANG S, YANG B. An inductive heterogeneous graph attention-based multi-agent deep graph infomax algorithm for adaptive traffic signal control[J]. Information Fusion, 2022, 88: 249-262. [47] SHAO Y, LI R, HU B, et al. Graph attention network-based multi-agent reinforcement learning for slicing resource management in dense cellular network[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 10792-10803. [48] XIAO J, YUAN G, HE J, et al. Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment[J]. Information Sciences, 2023, 620: 142-157. [49] LIU B, LUO H, WANG H, et al. YOLOv3_ReSAM: a small-target detection method[J]. Electronics, 2022, 11(10): 1635. [50] ZHOU D, YANG J, BAO R. Collaborative strategy network for spatial attention image captioning[J]. Applied Intelligence, 2022, 52(8): 9017-9032. [51] JANGIR R, HANSEN N, GHOSAL S, et al. Look closer: bridging egocentric and third-person views with transformers for robotic manipulation[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 3046-3053. [52] DENG T, LIU X, WANG L. Occluded vehicle detection via multi-scale hybrid attention mechanism in the road scene[J]. Electronics, 2022, 11(17): 2709. [53] YANG H, GAO S, WU X, et al. Online multi-object tracking using KCF-based single-object tracker with occlusion analysis[J]. Multimedia Systems, 2020, 26: 655-669. [54] LIU Z, CAO Y, CHEN J, et al. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 13309-13320. [55] WANG S, FUJII H, YOSHIMURA S. Generating merging strategies for connected autonomous vehicles based on spatiotemporal information extraction module and deep reinforcement learning[J]. Physica A: Statistical Mechanics and Its Applications, 2022, 607: 128172. [56] HE L, WU L, WANG M, et al. A spatial-temporal graph attention network for multi-intersection traffic light control[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. [57] PU Z, WANG H, LIU Z, et al. Attention enhanced reinforcement learning for multi agent cooperation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8235-8249. [58] LI W, JIN B, WANG X. SparseMAAC: sparse attention for multi-agent reinforcement learning[C]//Proceedings of the 2019 International Workshops on Database Systems for Advanced Applications, Chiang Mai, Apr 22-25, 2019. Cham: Springer, 2019: 96-110. [59] GOTO T, ITAYA H, HIRAKAWA T, et al. Solving the deadlock problem with deep reinforcement learning using information from multiple vehicles[C]//Proceedings of the 2022 IEEE Intelligent Vehicles Symposium. Piscataway: IEEE, 2022: 1026-1032. [60] HUANG C, LIU R. Design attention awareness among robots for uncertainty-adaptive heterogeneous teaming[C]//Proceedings of the 2022 IEEE International Conference on Advanced Robotics and Its Social Impacts. Piscataway: IEEE, 2022: 1-6. [61] WANG H, QIU T, LIU Z, et al. Multi-agent cognition difference reinforcement learning for multi-agent cooperation[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-7. [62] YU L, HUO S, WANG Z, et al. Hybrid attention-oriented experience replay for deep reinforcement learning and its application to a multi-robot cooperative hunting problem[J]. Neurocomputing, 2023, 523: 44-57. [63] SHIRI H, SEO H, PARK J, et al. Attention-based communication and control for multi-UAV path planning[J]. IEEE Wireless Communications Letters, 2022, 11(7): 1409-1413. [64] BU?ONIU L, BABU?KA R, DE SCHUTTER B. Multi-agent reinforcement learning: an overview[J]. Innovations in Multi-agent Systems and Applications-1, 2010: 183-221. [65] WANG Y, DE SILVA C W. Multi-robot box-pushing: single-agent Q-learning vs. team Q-learning[C]//Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2006: 3694-3699. [66] GALINDO-SERRANO A, GIUPPONI L. Distributed Q-learning for aggregated interference control in cognitive radio networks[J]. IEEE Transactions on Vehicular Technology, 2010, 59(4): 1823-1834. [67] LAN Q, PAN Y, FYSHE A, et al. Maxmin Q-learning: controlling the estimation bias of Q-learning[J]. arXiv:2002.06487, 2020. [68] LITTMAN M L. Friend-or-foe Q-learning in general-sum games[C]//Proceedings of the 18th International Conference on Machine Learning, Williamstown, Jun 28-Jul 1. San Francisco: Morgan Kaufmann, 2001: 322-328. [69] LI Y, NIU W, TIAN Y, et al. Multiagent reinforcement learning-based signal planning for resisting congestion attack in green transportation[J]. IEEE Transactions on Green Communications and Networking, 2022, 6(3): 1448-1458. [70] CHEN Z, ZHAO S, DENG H. Multi-mode light: learning special collaboration patterns for traffic signal control[C]//Proceedings of the 2022 International Conference on Artificial Neural Networks. Cham: Springer, 2022: 63-74. [71] WANG M, WU L, LI J, et al. Traffic signal control with reinforcement learning based on region-aware cooperative strategy[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 6774-6785. [72] SU C, YAN Y, WANG T, et al. A graph attention mechanism based multi-agent reinforcement learning method for efficient traffic light control[C]//Proceedings of the 2021 International Wireless Communications and Mobile Computing. Piscataway: IEEE, 2021: 1332-1337. [73] MA J, LIAN D. Attention-cooperated reinforcement learning for multi-agent path planning[C]//Proceedings of the 2022 International Conference on Database Systems for Advanced Applications. Cham: Springer, 2022: 272-290. [74] HE C, CHEN L, XU L, et al. IRLSOT: inverse reinforcement learning for scene-oriented trajectory prediction[J]. IET Intelligent Transport Systems, 2022, 16(6): 769-781. [75] ZHANG K, HE F, ZHANG Z, et al. Multi-vehicle routing problems with soft time windows: a multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. [76] HU D, YE Z, GAO Y, et al. Multi-agent deep reinforcement learning for voltage control with coordinated active and reactive power optimization[J]. IEEE Transactions on Smart Grid, 2022, 13(6): 4873-4886. [77] WANG T, MA S, XU N, et al. Secondary voltage collaborative control of distributed energy system via multi-agent reinforcement learning[J]. Energies, 2022, 15(19): 7047. [78] ZHU D, YANG B, LIU Y, et al. Energy management based on multi-agent deep reinforcement learning for a multi-energy industrial park[J]. Applied Energy, 2022, 311: 118636. [79] WANG Y, SHANG F, LEI J, et al. Dual-attention assisted deep reinforcement learning algorithm for energy-efficient resource allocation in industrial Internet of things[J]. Future Generation Computer Systems, 2023, 142: 150-164. [80] WANG B, LI S, GAO X, et al. Weighted mean field reinforcement learning for large-scale UAV swarm confrontation[J]. Applied Intelligence, 2023, 53(5): 5274-5289. [81] 刘强, 姜峰. 基于深度强化学习的群体对抗策略研究[J]. 智能计算机与应用, 2020, 10(5): 291-297. LIU Q, JIANG F. Research on group adversarial strategy based on deep reinforcement learning[J]. Intelligent Computer and Application, 2020, 10(5): 291-297. [82] SHI D, ZHAO C, WANG Y, et al. Multi actor hierarchical attention critic with RNN-based feature extraction[J]. Neurocomputing, 2022, 471: 79-93. [83] CHEN Y, SONG G, YE Z, et al. Scalable and transferable reinforcement learning for multi-agent mixed cooperative-competitive environments based on hierarchical graph attention[J]. Entropy, 2022, 24(4): 563. [84] SEN C, HARTVIGSEN T, YIN B, et al. Human attention maps for text classification: do humans and neural networks focus on the same words?[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4596-4608. |
[1] | 温雯, 邓峰颖, 郝志峰, 蔡瑞初, 梁方宇. 时空邻域感知的时序兴趣点推荐[J]. 计算机科学与探索, 2024, 18(7): 1865-1878. |
[2] | 王国凯, 张翔, 王顺芳. 多尺度和边界融合的皮肤病变区域分割网络[J]. 计算机科学与探索, 2024, 18(7): 1826-1837. |
[3] | 孟珍, 任冠宇, 万剑雄, 李雷孝. 车联网区块链分布式车对车计算卸载方法研究[J]. 计算机科学与探索, 2024, 18(7): 1923-1934. |
[4] | 王永贵, 刘丹妮. 融合多个性化桥和自监督学习的跨域推荐算法[J]. 计算机科学与探索, 2024, 18(7): 1792-1805. |
[5] | 陈东洋, 毛力. 融合增量学习与Transformer模型的股价预测研究[J]. 计算机科学与探索, 2024, 18(7): 1889-1899. |
[6] | 杨力, 钟俊弘, 张赟, 宋欣渝. 基于复合跨模态交互网络的时序多模态情感分析[J]. 计算机科学与探索, 2024, 18(5): 1318-1327. |
[7] | 王子豪, 钱雪忠, 宋威. 带有惩罚措施的自竞争事后经验重播算法[J]. 计算机科学与探索, 2024, 18(5): 1223-1231. |
[8] | 王香, 毛力, 陈祺东, 孙俊. 融合动态梯度和多视图协同注意力的情感分析[J]. 计算机科学与探索, 2024, 18(5): 1328-1338. |
[9] | 赵婷婷, 王莹, 孙威, 陈亚瑞, 王嫄, 杨巨成. 潜在空间中的策略搜索强化学习方法[J]. 计算机科学与探索, 2024, 18(4): 1032-1046. |
[10] | 王龙业, 肖越, 曾晓莉, 张凯信, 马傲. 王龙业,肖越,曾晓莉,张凯信,马傲[J]. 计算机科学与探索, 2024, 18(4): 978-989. |
[11] | 陈林颖, 刘建华, 郑智雄, 林杰, 徐戈, 孙水华. 多特征交互的方面情感三元组提取[J]. 计算机科学与探索, 2024, 18(4): 1057-1067. |
[12] | 周雅兰, 廖易天, 粟筱, 王甲海. 深度强化学习Memetic算法求解取送货车辆路径问题[J]. 计算机科学与探索, 2024, 18(3): 818-830. |
[13] | 林穗, 卢超海, 姜文超, 林晓珊, 周蔚林. 融合选择注意力的小样本知识图谱补全模型[J]. 计算机科学与探索, 2024, 18(3): 646-658. |
[14] | 彭斌, 白静, 李文静, 郑虎, 马向宇. 面向图像分类的视觉Transformer研究进展[J]. 计算机科学与探索, 2024, 18(2): 320-344. |
[15] | 刘晓雪, 姜春茂. 融合强化学习的三支治略选择及其有效性分析[J]. 计算机科学与探索, 2024, 18(2): 378-386. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||