Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (6): 1457-1475.DOI: 10.3778/j.issn.1673-9418.2312006
• Frontiers·Surveys • Previous Articles Next Articles
XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning
Online:
2024-06-01
Published:
2024-05-31
夏庆锋, 许可儿, 李明阳, 胡凯, 宋利鹏, 宋志强, 孙宁
XIA Qingfeng, XU Ke'er, LI Mingyang, HU Kai, SONG Lipeng, SONG Zhiqiang, SUN Ning. Review of Attention Mechanisms in Reinforcement Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1457-1475.
夏庆锋, 许可儿, 李明阳, 胡凯, 宋利鹏, 宋志强, 孙宁. 强化学习中的注意力机制研究综述[J]. 计算机科学与探索, 2024, 18(6): 1457-1475.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2312006
[1] 羊波, 王琨, 马祥祥, 等. 多智能体强化学习的机械臂运动控制决策研究[J]. 计算机工程与应用, 2023, 59(6): 318-325. YANG B, WANG K, MA X X, et al. Research on motion control method of manipulator based on reinforcement learning[J]. Computer Engineering and Applications, 2023, 59(6): 318-325. [2] MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 2204-2212. [3] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2961-2970. [4] WANG W, QIU Y, XUAN S, et al. Early rumor detection based on deep recurrent Q-learning[J]. Security and Communication Networks, 2021: 5569064. [5] MATIGNON L, LAURENT G J, LE FORT-PIAT N. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31. [6] OROOJLOOY A, HAXIEZHAD D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722. [7] BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156-172. [8] WANG J, HONG Y, WANG J, et al. Cooperative and competitive multi-agent systems: from optimization to games[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(5): 763-783. [9] CHANG W, LIZHEN W U, CHAO Y A N, et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork[J]. Chinese Journal of Aeronautics, 2020, 33(11): 2930-2945. [10] PONNIAH J, DANTSKER O D. Strategies for scaleable communication and coordination in multi-agent (UAV) systems[J]. Aerospace, 2022, 9(9): 488. [11] NGUYEN T T, NGUYEN N D, NAHAVANDI S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839. [12] WANG Y, YANG X, LIANG H, et al. A review of the self-adaptive traffic signal control system based on future traffic environment[J]. Journal of Advanced Transportation, 2018(3): 1-12. [13] ARWA E O, FOLLY K A. Reinforcement learning techniques for optimal power control in grid-connected microgrids: a comprehensive review[J]. IEEE Access, 2020, 8: 208992-209007. [14] ZHANG K, YANG Z, BA?AR T. Decentralized multi-agent reinforcement learning with networked agents: recent advances[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(6): 802-814. [15] 胡凯, 郑翡, 卢飞宇, 等. 基于深度学习的行为识别算法综述[J]. 南京信息工程大学学报(自然科学版), 2021, 13(6): 730-743. HU K, ZHENG F, LU F Y, et al. A review of behavior recognition algorithms based on deep learning[J]. Journal of Nanjing University of Information Engineering (Natural Science Edition), 2021, 13(6): 730-743. [16] RUPPRECHT T, WANG Y. A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions[J]. Neural Networks, 2022, 153: 13-36. [17] BANIK S, LOEFFLER T D, BATRA R, et al. Learning with delayed rewards—a case study on inverse defect design in 2D materials[J]. ACS Applied Materials & Interfaces, 2021, 13(30): 36455-36464. [18] SHARMA R, GOPAL M. A Markov game-adaptive fuzzy controller for robot manipulators[J]. IEEE Transactions on Fuzzy Systems, 2008, 16(1): 171-186. [19] HWANG K S, TAN S W, CHEN C C. Cooperative strategy based on adaptive Q-learning for robot soccer systems[J]. IEEE Transactions on Fuzzy Systems, 2004, 12(4): 569-576. [20] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning, New York, Jun 19-24, 2016: 1995-2003. [21] 韩润海, 陈浩, 刘权, 等. 基于对手动作预测的智能博弈对抗算法[J]. 计算机工程与应用, 2023, 59(7): 190-197. HAN R H, CHEN H, LIU Q, et al. Intelligent game countermeasures algorithm based on opponent action prediction[J]. Computer Engineering and Applications, 2023, 59(7): 190-197. [22] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 3215-3222. [23] HAN M, MAY R, ZHANG X, et al. A review of reinforcement learning methodologies for controlling occupant comfort in buildings[J]. Sustainable Cities and Society, 2019, 51: 101748. [24] GUPTA S, SINGAL G, GARG D. Deep reinforcement learning techniques in diversified domains: a survey[J]. Archives of Computational Methods in Engineering, 2021, 28(7): 4715-4754. [25] TANG C Y, LIU C H, CHEN W K, et al. Implementing action mask in proximal policy optimization (PPO) algorithm[J]. ICT Express, 2020, 6(3): 200-203. [26] ZHOU C, HUANG B, FR?NTI P. A review of motion planning algorithms for intelligent robots[J]. Journal of Intelligent Manufacturing, 2022, 33(2): 387-424. [27] 汪晨曦, 赵学艳, 郭新. 基于权重值的竞争深度双Q网络算法[J]. 南京信息工程大学学报(自然科学版), 2021, 13(5): 564-570. WANG C X, ZHAO X Y, GUO X. Competitive deep dual Q-network algorithm based on weight values[J]. Journal of Nanjing University of Information Engineering (Natural Science Edition), 2021, 13(5): 564-570. [28] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. ZHU Z L, RAO Y, WU Y, et al. Research progress of attention mechanism in deep learning[J]. Journal of Chinese Information, 2019, 33(6): 1-11. [29] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368. [30] KWAK D, CHOI S, CHANG W. Self-attention based deep direct recurrent reinforcement learning with hybrid loss for trading signal generation[J]. Information Sciences, 2023, 623: 592-606. [31] YAN L, ZHU L, SONG K, et al. Graph cooperation deep reinforcement learning for ecological urban traffic signal control[J]. Applied Intelligence, 2023, 53(6): 6248-6265. [32] ZHAO J, ZHU T, XIAO S, et al. Actor-critic for multi-agent reinforcement learning with self-attention[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2022, 36(9): 1-16. [33] 杨飞扬, 崔荣一, 赵亚慧, 等. 基于强化学习与自注意力机制的朝鲜语重要句子结构识别[J]. 中文信息学报, 2021, 35(9): 9. YANG F Y, CUI R Y, ZHAO Y H, et al. Recognizing important sentence structures in Korean based on reinforcement learning and self-attention mechanism[J]. Journal of Chinese Information, 2021, 35(9): 9. [34] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842-1858. LI J C, SHI H B, HWANG K S. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese Journal of Computers, 2022, 45(9):1842-1858. [35] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [36] LIU K, ZHAO Y, WANG G, et al. Self-attention-based multi-agent continuous control method in cooperative environments[J]. Information Sciences, 2022, 585: 454-470. [37] LIU J, WANG G, FU Q, et al. Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning[J]. Defence Technology, 2023, 19: 210-219. [38] 陈栋, 李明, 李莉, 等. 基于双池化注意力机制的高光谱图像分类算法[J]. 南京信息工程大学学报(自然科学版), 2023, 15(4): 393-402. CHEN D, LI M, LI L, et al. Hyperspectral image classification algorithm based on dual pooling attention mechanism[J]. Journal of Nanjing University of Information Science and Technology (Natural Science Edition), 2023, 15(4): 393-402. [39] YAN C, WANG C, XIANG X, et al. Collision-avoiding flocking with multiple fixed-wing UAVs in obstacle-cluttered environments: a task-specific curriculum-based MADRL approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023: 3245124. [40] WANG Y, SHI D, XUE C, et al. AHAC: actor hierarchical attention critic for multi-agent reinforcement learning[C]//Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2020: 3013-3020. [41] WANG W, ZHANG Y, SUI Y, et al. reinforcement-learning-guided source code summarization using hierarchical attention[J]. IEEE Transactions on Software Engineering, 2020, 48(1): 102-119. [42] YANG L, WANG X, DAI Y, et al. HackRL: reinforcement learning with hierarchical attention for cross-graph knowledge fusion and collaborative reasoning[J]. Knowledge-Based Systems, 2021, 233: 107498. [43] XIE Z, HU L, HUANG Y, et al. A semiopportunistic task allocation framework for mobile crowdsensing with deep learning[J]. Wireless Communications and Mobile Computing, 2021: 6643229. [44] REN Y, YE Z, SONG G, et al. Space-air-ground integrated mobile crowdsensing for partially observable data collection by multi-scale convolutional graph reinforcement learning[J]. Entropy, 2022, 24(5): 638. [45] YAN Y, ZHANG B, LI C, et al. Cooperative caching and fetching in D2D communications—a fully decentralized multi-agent reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2020, 69(12): 16095-16109. [46] YANG S, YANG B. An inductive heterogeneous graph attention-based multi-agent deep graph infomax algorithm for adaptive traffic signal control[J]. Information Fusion, 2022, 88: 249-262. [47] SHAO Y, LI R, HU B, et al. Graph attention network-based multi-agent reinforcement learning for slicing resource management in dense cellular network[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 10792-10803. [48] XIAO J, YUAN G, HE J, et al. Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment[J]. Information Sciences, 2023, 620: 142-157. [49] LIU B, LUO H, WANG H, et al. YOLOv3_ReSAM: a small-target detection method[J]. Electronics, 2022, 11(10): 1635. [50] ZHOU D, YANG J, BAO R. Collaborative strategy network for spatial attention image captioning[J]. Applied Intelligence, 2022, 52(8): 9017-9032. [51] JANGIR R, HANSEN N, GHOSAL S, et al. Look closer: bridging egocentric and third-person views with transformers for robotic manipulation[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 3046-3053. [52] DENG T, LIU X, WANG L. Occluded vehicle detection via multi-scale hybrid attention mechanism in the road scene[J]. Electronics, 2022, 11(17): 2709. [53] YANG H, GAO S, WU X, et al. Online multi-object tracking using KCF-based single-object tracker with occlusion analysis[J]. Multimedia Systems, 2020, 26: 655-669. [54] LIU Z, CAO Y, CHEN J, et al. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 13309-13320. [55] WANG S, FUJII H, YOSHIMURA S. Generating merging strategies for connected autonomous vehicles based on spatiotemporal information extraction module and deep reinforcement learning[J]. Physica A: Statistical Mechanics and Its Applications, 2022, 607: 128172. [56] HE L, WU L, WANG M, et al. A spatial-temporal graph attention network for multi-intersection traffic light control[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. [57] PU Z, WANG H, LIU Z, et al. Attention enhanced reinforcement learning for multi agent cooperation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8235-8249. [58] LI W, JIN B, WANG X. SparseMAAC: sparse attention for multi-agent reinforcement learning[C]//Proceedings of the 2019 International Workshops on Database Systems for Advanced Applications, Chiang Mai, Apr 22-25, 2019. Cham: Springer, 2019: 96-110. [59] GOTO T, ITAYA H, HIRAKAWA T, et al. Solving the deadlock problem with deep reinforcement learning using information from multiple vehicles[C]//Proceedings of the 2022 IEEE Intelligent Vehicles Symposium. Piscataway: IEEE, 2022: 1026-1032. [60] HUANG C, LIU R. Design attention awareness among robots for uncertainty-adaptive heterogeneous teaming[C]//Proceedings of the 2022 IEEE International Conference on Advanced Robotics and Its Social Impacts. Piscataway: IEEE, 2022: 1-6. [61] WANG H, QIU T, LIU Z, et al. Multi-agent cognition difference reinforcement learning for multi-agent cooperation[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-7. [62] YU L, HUO S, WANG Z, et al. Hybrid attention-oriented experience replay for deep reinforcement learning and its application to a multi-robot cooperative hunting problem[J]. Neurocomputing, 2023, 523: 44-57. [63] SHIRI H, SEO H, PARK J, et al. Attention-based communication and control for multi-UAV path planning[J]. IEEE Wireless Communications Letters, 2022, 11(7): 1409-1413. [64] BU?ONIU L, BABU?KA R, DE SCHUTTER B. Multi-agent reinforcement learning: an overview[J]. Innovations in Multi-agent Systems and Applications-1, 2010: 183-221. [65] WANG Y, DE SILVA C W. Multi-robot box-pushing: single-agent Q-learning vs. team Q-learning[C]//Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2006: 3694-3699. [66] GALINDO-SERRANO A, GIUPPONI L. Distributed Q-learning for aggregated interference control in cognitive radio networks[J]. IEEE Transactions on Vehicular Technology, 2010, 59(4): 1823-1834. [67] LAN Q, PAN Y, FYSHE A, et al. Maxmin Q-learning: controlling the estimation bias of Q-learning[J]. arXiv:2002.06487, 2020. [68] LITTMAN M L. Friend-or-foe Q-learning in general-sum games[C]//Proceedings of the 18th International Conference on Machine Learning, Williamstown, Jun 28-Jul 1. San Francisco: Morgan Kaufmann, 2001: 322-328. [69] LI Y, NIU W, TIAN Y, et al. Multiagent reinforcement learning-based signal planning for resisting congestion attack in green transportation[J]. IEEE Transactions on Green Communications and Networking, 2022, 6(3): 1448-1458. [70] CHEN Z, ZHAO S, DENG H. Multi-mode light: learning special collaboration patterns for traffic signal control[C]//Proceedings of the 2022 International Conference on Artificial Neural Networks. Cham: Springer, 2022: 63-74. [71] WANG M, WU L, LI J, et al. Traffic signal control with reinforcement learning based on region-aware cooperative strategy[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 6774-6785. [72] SU C, YAN Y, WANG T, et al. A graph attention mechanism based multi-agent reinforcement learning method for efficient traffic light control[C]//Proceedings of the 2021 International Wireless Communications and Mobile Computing. Piscataway: IEEE, 2021: 1332-1337. [73] MA J, LIAN D. Attention-cooperated reinforcement learning for multi-agent path planning[C]//Proceedings of the 2022 International Conference on Database Systems for Advanced Applications. Cham: Springer, 2022: 272-290. [74] HE C, CHEN L, XU L, et al. IRLSOT: inverse reinforcement learning for scene-oriented trajectory prediction[J]. IET Intelligent Transport Systems, 2022, 16(6): 769-781. [75] ZHANG K, HE F, ZHANG Z, et al. Multi-vehicle routing problems with soft time windows: a multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. [76] HU D, YE Z, GAO Y, et al. Multi-agent deep reinforcement learning for voltage control with coordinated active and reactive power optimization[J]. IEEE Transactions on Smart Grid, 2022, 13(6): 4873-4886. [77] WANG T, MA S, XU N, et al. Secondary voltage collaborative control of distributed energy system via multi-agent reinforcement learning[J]. Energies, 2022, 15(19): 7047. [78] ZHU D, YANG B, LIU Y, et al. Energy management based on multi-agent deep reinforcement learning for a multi-energy industrial park[J]. Applied Energy, 2022, 311: 118636. [79] WANG Y, SHANG F, LEI J, et al. Dual-attention assisted deep reinforcement learning algorithm for energy-efficient resource allocation in industrial Internet of things[J]. Future Generation Computer Systems, 2023, 142: 150-164. [80] WANG B, LI S, GAO X, et al. Weighted mean field reinforcement learning for large-scale UAV swarm confrontation[J]. Applied Intelligence, 2023, 53(5): 5274-5289. [81] 刘强, 姜峰. 基于深度强化学习的群体对抗策略研究[J]. 智能计算机与应用, 2020, 10(5): 291-297. LIU Q, JIANG F. Research on group adversarial strategy based on deep reinforcement learning[J]. Intelligent Computer and Application, 2020, 10(5): 291-297. [82] SHI D, ZHAO C, WANG Y, et al. Multi actor hierarchical attention critic with RNN-based feature extraction[J]. Neurocomputing, 2022, 471: 79-93. [83] CHEN Y, SONG G, YE Z, et al. Scalable and transferable reinforcement learning for multi-agent mixed cooperative-competitive environments based on hierarchical graph attention[J]. Entropy, 2022, 24(4): 563. [84] SEN C, HARTVIGSEN T, YIN B, et al. Human attention maps for text classification: do humans and neural networks focus on the same words?[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4596-4608. |
[1] | WANG Yonggui, LIU Danni. Cross-Domain Recommendation Algorithm Combining Multi-personalized Bridges and Self-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1792-1805. |
[2] | WEN Wen, DENG Fengying, HAO Zhifeng, CAI Ruichu, LIANG Fangyu. Recommendation Method for Time-Sequence Point of Interest via Spatio-Temporal Vicinity Perception [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1865-1878. |
[3] | WANG Guokai, ZHANG Xiang, WANG Shunfang. Multi-scale and Boundary Fusion Network for Skin Lesion Regions Segmentation [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1826-1837. |
[4] | MENG Zhen, REN Guanyu, WAN Jianxiong, LI Leixiao. Research on Distributed V2V Computation Offloading Method for Internet of Vehicles Blockchain [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1923-1934. |
[5] | YANG Li, ZHONG Junhong, ZHANG Yun, SONG Xinyu. Temporal Multimodal Sentiment Analysis with Composite Cross Modal Interaction Network [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1318-1327. |
[6] | WANG Zihao, QIAN Xuezhong, SONG Wei. Self-competitive Hindsight Experience Replay with Penalty Measures [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1223-1231. |
[7] | WANG Xiang, MAO Li, CHEN Qidong, SUN Jun. Sentiment Analysis Combining Dynamic Gradient and Multi-view Co-attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1328-1338. |
[8] | ZHAO Tingting, WANG Ying, SUN Wei, CHEN Yarui, WANG Yuan, YANG Jucheng. Policy Search Reinforcement Learning Method in Latent Space [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 1032-1046. |
[9] | WANG Longye, XIAO Yue, ZENG Xiaoli, ZHANG Kaixin, MA Ao. Skin Disease Segmentation Method Combining Dense Encoder and Dual-Path Attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 978-989. |
[10] | CHEN Linying, LIU Jianhua, ZHENG Zhixiong, LIN Jie, XU Ge, SUN Shuihua. Multi-feature Interaction for Aspect Sentiment Triplet Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 1057-1067. |
[11] | LIN Sui, LU Chaohai, JIANG Wenchao, LIN Xiaoshan, ZHOU Weilin. Few-Shot Knowledge Graph Completion Based on Selective Attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 646-658. |
[12] | ZHOU Yalan, LIAO Yitian, SU Xiao, WANG Jiahai. Memetic Algorithm Based on Deep Reinforcement Learning for Vehicle Routing Problem with Pickup-Delivery [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 818-830. |
[13] | QI Xuanhao, ZHI Min. Review of Attention Mechanisms in Image Processing [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 345-362. |
[14] | LIU Xiaoxue, JIANG Chunmao. Strategy Selection and Outcome Evaluation of Three-Way Decisions Based on Reinforcement Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 378-386. |
[15] | PENG Bin, BAI Jing, LI Wenjing, ZHENG Hu, MA Xiangyu. Survey on Visual Transformer for Image Classification [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 320-344. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/