Review of Attention Mechanisms in Reinforcement Learning

doi:10.3778/j.issn.1673-9418.2312006

Abstract

Abstract: In recent years, the combination of reinforcement learning and attention mechanisms has attracted an increasing attention in algorithmic research field. Attention mechanisms play an important role in improving the performance of algorithms in reinforcement learning. This paper mainly focuses on the development of attention mechanisms in deep reinforcement learning and examining their applications in the multi-agent reinforcement learning domain. Relevant researches are conducted accordingly. Firstly, the background and development of attention mechanisms and reinforcement learning are introduced, and relevant experimental platforms in this field are also presented. Secondly, classical algorithms of reinforcement learning and attention mechanisms are reviewed and attention mechanism is categorized from different perspectives. Thirdly, practical applications of attention mechanisms in the reinforcement field are sorted out based on three types of tasks including fully cooperative, fully competitive and mixed, with focus on the application in the field of multi-agent. Finally, the improvement of attention mechanisms on reinforcement learning algorithms is summarized. The challenges and future prospects in this field are discussed.

Key words: reinforcement learning, attention mechanism, multi-agent system

摘要： 近年来，强化学习与注意力机制的结合在算法研究领域备受瞩目。在强化学习算法中，注意力机制的应用在提高算法性能方面发挥了重要作用。重点聚焦于注意力机制在深度强化学习中的发展，审视了其在多智能体强化学习领域的应用，并对相关研究成果进行调研。首先，介绍了注意力机制和强化学习的研究背景与发展历程，并调研了该领域中的相关实验平台；然后，回顾了强化学习与注意力机制的经典算法，并从不同角度对注意力机制进行分类；接着，对注意力机制在强化学习领域的应用进行了梳理，根据三种任务类型（完全合作型、完全竞争型和混合合作竞争型）进行分类分析，重点关注了多智能体领域的应用情况；最后，总结了注意力机制对强化学习算法的改进作用，并展望了该领域所面临的挑战和未来的研究前景。

关键词: 强化学习, 注意力机制, 多智能体系统

XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning. Review of Attention Mechanisms in Reinforcement Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1457-1475.

夏庆锋, 许可儿, 李明阳, 胡凯, 宋利鹏, 宋志强, 孙宁. 强化学习中的注意力机制研究综述[J]. 计算机科学与探索, 2024, 18(6): 1457-1475.

References

[1] 羊波, 王琨, 马祥祥，等. 多智能体强化学习的机械臂运动控制决策研究[J]. 计算机工程与应用, 2023, 59(6): 318-325.
YANG B, WANG K, MA X X, et al. Research on motion control method of manipulator based on reinforcement learning[J]. Computer Engineering and Applications, 2023, 59(6): 318-325.
[2] MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 2204-2212.
[3] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2961-2970.
[4] WANG W, QIU Y, XUAN S, et al. Early rumor detection based on deep recurrent Q-learning[J]. Security and Communication Networks, 2021: 5569064.
[5] MATIGNON L, LAURENT G J, LE FORT-PIAT N. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31.
[6] OROOJLOOY A, HAXIEZHAD D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722.
[7] BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156-172.
[8] WANG J, HONG Y, WANG J, et al. Cooperative and competitive multi-agent systems: from optimization to games[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(5): 763-783.
[9] CHANG W, LIZHEN W U, CHAO Y A N, et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork[J]. Chinese Journal of Aeronautics, 2020, 33(11): 2930-2945.
[10] PONNIAH J, DANTSKER O D. Strategies for scaleable communication and coordination in multi-agent (UAV) systems[J]. Aerospace, 2022, 9(9): 488.
[11] NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
[12] WANG Y, YANG X, LIANG H, et al. A review of the self-adaptive traffic signal control system based on future traffic environment[J]. Journal of Advanced Transportation, 2018(3): 1-12.
[13] ARWA E O, FOLLY K A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: a comprehensive review[J]. IEEE Access, 2020, 8: 208992-209007.
[14] ZHANG K, YANG Z, BA?AR T. Decentralized multi-agent reinforcement learning with networked agents: recent advances[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(6): 802-814.
[15] 胡凯, 郑翡, 卢飞宇, 等. 基于深度学习的行为识别算法综述[J]. 南京信息工程大学学报(自然科学版), 2021, 13(6): 730-743.
HU K, ZHENG F, LU F Y, et al. A review of behavior recognition algorithms based on deep learning[J]. Journal of Nanjing University of Information Engineering (Natural Science Edition), 2021, 13(6): 730-743.
[16] RUPPRECHT T, WANG Y. A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions[J]. Neural Networks, 2022, 153: 13-36.
[17] BANIK S, LOEFFLER T D, BATRA R, et al. Learning with delayed rewards—a case study on inverse defect design in 2D materials[J]. ACS Applied Materials & Interfaces, 2021, 13(30): 36455-36464.
[18] SHARMA R, GOPAL M. A Markov game-adaptive fuzzy controller for robot manipulators[J]. IEEE Transactions on Fuzzy Systems, 2008, 16(1): 171-186.
[19] HWANG K S, TAN S W, CHEN C C. Cooperative strategy based on adaptive Q-learning for robot soccer systems[J]. IEEE Transactions on Fuzzy Systems, 2004, 12(4): 569-576.
[20] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1995-2003.
[21] 韩润海, 陈浩, 刘权, 等. 基于对手动作预测的智能博弈对抗算法[J]. 计算机工程与应用, 2023, 59(7): 190-197.
HAN R H, CHEN H, LIU Q, et al. Intelligent game countermeasures algorithm based on opponent action prediction[J]. Computer Engineering and Applications, 2023, 59(7): 190-197.
[22] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 3215-3222.
[23] HAN M, MAY R, ZHANG X, et al. A review of reinforcement learning methodologies for controlling occupant comfort in buildings[J]. Sustainable Cities and Society, 2019, 51: 101748.
[24] GUPTA S, SINGAL G, GARG D. Deep reinforcement learning techniques in diversified domains: a survey[J]. Archives of Computational Methods in Engineering, 2021, 28(7): 4715-4754.
[25] TANG C Y, LIU C H, CHEN W K, et al. Implementing action mask in proximal policy optimization (PPO) algorithm[J]. ICT Express, 2020, 6(3): 200-203.
[26] ZHOU C, HUANG B, FR?NTI P. A review of motion planning algorithms for intelligent robots[J]. Journal of Intelligent Manufacturing, 2022, 33(2): 387-424.
[27] 汪晨曦, 赵学艳, 郭新. 基于权重值的竞争深度双Q网络算法[J]. 南京信息工程大学学报(自然科学版), 2021, 13(5): 564-570.
WANG C X, ZHAO X Y, GUO X. Competitive deep dual Q-network algorithm based on weight values[J]. Journal of Nanjing University of Information Engineering (Natural Science Edition), 2021, 13(5): 564-570.
[28] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11.
ZHU Z L, RAO Y, WU Y, et al. Research progress of attention mechanism in deep learning[J]. Journal of Chinese Information, 2019, 33(6): 1-11.
[29] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368.
[30] KWAK D, CHOI S, CHANG W. Self-attention based deep direct recurrent reinforcement learning with hybrid loss for trading signal generation[J]. Information Sciences, 2023, 623: 592-606.
[31] YAN L, ZHU L, SONG K, et al. Graph cooperation deep reinforcement learning for ecological urban traffic signal control[J]. Applied Intelligence, 2023, 53(6): 6248-6265.
[32] ZHAO J, ZHU T, XIAO S, et al. Actor-critic for multi-agent reinforcement learning with self-attention[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2022, 36(9): 1-16.
[33] 杨飞扬, 崔荣一, 赵亚慧, 等. 基于强化学习与自注意力机制的朝鲜语重要句子结构识别[J]. 中文信息学报, 2021, 35(9): 9.
YANG F Y, CUI R Y, ZHAO Y H, et al. Recognizing important sentence structures in Korean based on reinforcement learning and self-attention mechanism[J]. Journal of Chinese Information, 2021, 35(9): 9.
[34] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842-1858.
LI J C, SHI H B, HWANG K S. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese Journal of Computers, 2022, 45(9):1842-1858.
[35] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[36] LIU K, ZHAO Y, WANG G, et al. Self-attention-based multi-agent continuous control method in cooperative environments[J]. Information Sciences, 2022, 585: 454-470.
[37] LIU J, WANG G, FU Q, et al. Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning[J]. Defence Technology, 2023, 19: 210-219.
[38] 陈栋, 李明, 李莉, 等. 基于双池化注意力机制的高光谱图像分类算法[J]. 南京信息工程大学学报(自然科学版), 2023, 15(4): 393-402.
CHEN D, LI M, LI L, et al. Hyperspectral image classification algorithm based on dual pooling attention mechanism[J]. Journal of Nanjing University of Information Science and Technology (Natural Science Edition), 2023, 15(4): 393-402.
[39] YAN C, WANG C, XIANG X, et al. Collision-avoiding flocking with multiple fixed-wing UAVs in obstacle-cluttered environments: a task-specific curriculum-based MADRL approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023: 3245124.
[40] WANG Y, SHI D, XUE C, et al. AHAC: actor hierarchical attention critic for multi-agent reinforcement learning[C]//Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2020: 3013-3020.
[41] WANG W, ZHANG Y, SUI Y, et al. reinforcement-learning-guided source code summarization using hierarchical attention[J]. IEEE Transactions on Software Engineering, 2020, 48(1): 102-119.
[42] YANG L, WANG X, DAI Y, et al. HackRL: reinforcement learning with hierarchical attention for cross-graph knowledge fusion and collaborative reasoning[J]. Knowledge-Based Systems, 2021, 233: 107498.
[43] XIE Z, HU L, HUANG Y, et al. A semiopportunistic task allocation framework for mobile crowdsensing with deep learning[J]. Wireless Communications and Mobile Computing, 2021: 6643229.
[44] REN Y, YE Z, SONG G, et al. Space-air-ground integrated mobile crowdsensing for partially observable data collection by multi-scale convolutional graph reinforcement learning[J]. Entropy, 2022, 24(5): 638.
[45] YAN Y, ZHANG B, LI C, et al. Cooperative caching and fetching in D2D communications—a fully decentralized multi-agent reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2020, 69(12): 16095-16109.
[46] YANG S, YANG B. An inductive heterogeneous graph attention-based multi-agent deep graph infomax algorithm for adaptive traffic signal control[J]. Information Fusion, 2022, 88: 249-262.
[47] SHAO Y, LI R, HU B, et al. Graph attention network-based multi-agent reinforcement learning for slicing resource management in dense cellular network[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 10792-10803.
[48] XIAO J, YUAN G, HE J, et al. Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment[J]. Information Sciences, 2023, 620: 142-157.
[49] LIU B, LUO H, WANG H, et al. YOLOv3_ReSAM: a small-target detection method[J]. Electronics, 2022, 11(10): 1635.
[50] ZHOU D, YANG J, BAO R. Collaborative strategy network for spatial attention image captioning[J]. Applied Intelligence, 2022, 52(8): 9017-9032.
[51] JANGIR R, HANSEN N, GHOSAL S, et al. Look closer: bridging egocentric and third-person views with transformers for robotic manipulation[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 3046-3053.
[52] DENG T, LIU X, WANG L. Occluded vehicle detection via multi-scale hybrid attention mechanism in the road scene[J]. Electronics, 2022, 11(17): 2709.
[53] YANG H, GAO S, WU X, et al. Online multi-object tracking using KCF-based single-object tracker with occlusion analysis[J]. Multimedia Systems, 2020, 26: 655-669.
[54] LIU Z, CAO Y, CHEN J, et al. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 13309-13320.
[55] WANG S, FUJII H, YOSHIMURA S. Generating merging strategies for connected autonomous vehicles based on spatiotemporal information extraction module and deep reinforcement learning[J]. Physica A: Statistical Mechanics and Its Applications, 2022, 607: 128172.
[56] HE L, WU L, WANG M, et al. A spatial-temporal graph attention network for multi-intersection traffic light control[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8.
[57] PU Z, WANG H, LIU Z, et al. Attention enhanced reinforcement learning for multi agent cooperation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8235-8249.
[58] LI W, JIN B, WANG X. SparseMAAC: sparse attention for multi-agent reinforcement learning[C]//Proceedings of the 2019 International Workshops on Database Systems for Advanced Applications, Chiang Mai, Apr 22-25, 2019. Cham: Springer, 2019: 96-110.
[59] GOTO T, ITAYA H, HIRAKAWA T, et al. Solving the deadlock problem with deep reinforcement learning using information from multiple vehicles[C]//Proceedings of the 2022 IEEE Intelligent Vehicles Symposium. Piscataway: IEEE, 2022: 1026-1032.
[60] HUANG C, LIU R. Design attention awareness among robots for uncertainty-adaptive heterogeneous teaming[C]//Proceedings of the 2022 IEEE International Conference on Advanced Robotics and Its Social Impacts. Piscataway: IEEE, 2022: 1-6.
[61] WANG H, QIU T, LIU Z, et al. Multi-agent cognition difference reinforcement learning for multi-agent cooperation[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-7.
[62] YU L, HUO S, WANG Z, et al. Hybrid attention-oriented experience replay for deep reinforcement learning and its application to a multi-robot cooperative hunting problem[J]. Neurocomputing, 2023, 523: 44-57.
[63] SHIRI H, SEO H, PARK J, et al. Attention-based communication and control for multi-UAV path planning[J]. IEEE Wireless Communications Letters, 2022, 11(7): 1409-1413.
[64] BU?ONIU L, BABU?KA R, DE SCHUTTER B. Multi-agent reinforcement learning: an overview[J]. Innovations in Multi-agent Systems and Applications-1, 2010: 183-221.
[65] WANG Y, DE SILVA C W. Multi-robot box-pushing: single-agent Q-learning vs. team Q-learning[C]//Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2006: 3694-3699.
[66] GALINDO-SERRANO A, GIUPPONI L. Distributed Q-learning for aggregated interference control in cognitive radio networks[J]. IEEE Transactions on Vehicular Technology, 2010, 59(4): 1823-1834.
[67] LAN Q, PAN Y, FYSHE A, et al. Maxmin Q-learning: controlling the estimation bias of Q-learning[J]. arXiv:2002.06487, 2020.
[68] LITTMAN M L. Friend-or-foe Q-learning in general-sum games[C]//Proceedings of the 18th International Conference on Machine Learning， Williamstown, Jun 28-Jul 1. San Francisco: Morgan Kaufmann, 2001: 322-328.
[69] LI Y, NIU W, TIAN Y, et al. Multiagent reinforcement learning-based signal planning for resisting congestion attack in green transportation[J]. IEEE Transactions on Green Communications and Networking, 2022, 6(3): 1448-1458.
[70] CHEN Z, ZHAO S, DENG H. Multi-mode light: learning special collaboration patterns for traffic signal control[C]//Proceedings of the 2022 International Conference on Artificial Neural Networks. Cham: Springer, 2022: 63-74.
[71] WANG M, WU L, LI J, et al. Traffic signal control with reinforcement learning based on region-aware cooperative strategy[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 6774-6785.
[72] SU C, YAN Y, WANG T, et al. A graph attention mechanism based multi-agent reinforcement learning method for efficient traffic light control[C]//Proceedings of the 2021 International Wireless Communications and Mobile Computing. Piscataway: IEEE, 2021: 1332-1337.
[73] MA J, LIAN D. Attention-cooperated reinforcement learning for multi-agent path planning[C]//Proceedings of the 2022 International Conference on Database Systems for Advanced Applications. Cham: Springer, 2022: 272-290.
[74] HE C, CHEN L, XU L, et al. IRLSOT: inverse reinforcement learning for scene-oriented trajectory prediction[J]. IET Intelligent Transport Systems, 2022, 16(6): 769-781.
[75] ZHANG K, HE F, ZHANG Z, et al. Multi-vehicle routing problems with soft time windows: a multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861.
[76] HU D, YE Z, GAO Y, et al. Multi-agent deep reinforcement learning for voltage control with coordinated active and reactive power optimization[J]. IEEE Transactions on Smart Grid, 2022, 13(6): 4873-4886.
[77] WANG T, MA S, XU N, et al. Secondary voltage collaborative control of distributed energy system via multi-agent reinforcement learning[J]. Energies, 2022, 15(19): 7047.
[78] ZHU D, YANG B, LIU Y, et al. Energy management based on multi-agent deep reinforcement learning for a multi-energy industrial park[J]. Applied Energy, 2022, 311: 118636.
[79] WANG Y, SHANG F, LEI J, et al. Dual-attention assisted deep reinforcement learning algorithm for energy-efficient resource allocation in industrial Internet of things[J]. Future Generation Computer Systems, 2023, 142: 150-164.
[80] WANG B, LI S, GAO X, et al. Weighted mean field reinforcement learning for large-scale UAV swarm confrontation[J]. Applied Intelligence, 2023, 53(5): 5274-5289.
[81] 刘强, 姜峰. 基于深度强化学习的群体对抗策略研究[J]. 智能计算机与应用, 2020, 10(5): 291-297.
LIU Q, JIANG F. Research on group adversarial strategy based on deep reinforcement learning[J]. Intelligent Computer and Application, 2020, 10(5): 291-297.
[82] SHI D, ZHAO C, WANG Y, et al. Multi actor hierarchical attention critic with RNN-based feature extraction[J]. Neurocomputing, 2022, 471: 79-93.
[83] CHEN Y, SONG G, YE Z, et al. Scalable and transferable reinforcement learning for multi-agent mixed cooperative-competitive environments based on hierarchical graph attention[J]. Entropy, 2022, 24(4): 563.
[84] SEN C, HARTVIGSEN T, YIN B, et al. Human attention maps for text classification: do humans and neural networks focus on the same words?[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4596-4608.