计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (2): 345-362.DOI: 10.3778/j.issn.1673-9418.2305057
祁宣豪,智敏
出版日期:
2024-02-01
发布日期:
2024-02-01
QI Xuanhao, ZHI Min
Online:
2024-02-01
Published:
2024-02-01
摘要: 图像处理中的注意力机制已成为深度学习领域中流行且重要的技术之一,因其具有优秀的即插即用便利性,被广泛应用于图像处理领域的各种深度学习模型中。注意力机制通过对输入特征进行加权处理,将模型的注意力集中于最重要的区域,以提升图像处理任务的准确性和性能。首先,将注意力机制的发展过程划分为四个阶段,并在此基础上对通道注意力、空间注意力、通道与空间混合注意力和自注意力四个方面的研究现状及进展进行了回顾与总结;其次,针对注意力机制的核心思想、关键结构和具体实现进行了详细的论述,并进一步总结和归纳所使用模型的优缺点;最后,通过对当前主流的注意力机制进行对比实验和结果分析,讨论了现阶段注意力机制在图像处理领域中存在的问题,并对图像处理领域中注意力机制的未来发展进行展望,为进一步研究提供参考。
祁宣豪, 智敏. 图像处理中注意力机制综述[J]. 计算机科学与探索, 2024, 18(2): 345-362.
QI Xuanhao, ZHI Min. Review of Attention Mechanisms in Image Processing[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 345-362.
[1] ITTI L, KOCH C, NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259. [2] HAYHOE M, BALLARD D. Eye movements in natural behavior[J]. Trends in Cognitive Sciences, 2005, 9(4): 188-194. [3] RENSINK R A. The dynamic representation of scenes[J]. Visual Cognition, 2000, 7: 34576. [4] CORBETTA M, SHULMAN G L. Control of goal-directed and stimulus-driven attention in the brain[J]. Nature Reviews Neuroscience, 2002, 3(3): 201-215. [5] 张卫锋. 跨媒体数据语义分析技术研究[D]. 杭州: 杭州电子科技大学, 2019. ZHANG W F. Research on semantic analysis technology of cross-media data[D]. Hangzhou: Hangzhou University of Electronic Science and Technology, 2019. [6] MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention[C]//Advances?in?Neural Information Processing Systems 27, Montreal, Dec?8-13,?2014: 2204-2212. [7] ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211. [8] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv:1408.5882v2, 2014. [9] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances?in?Neural?Information?Processing?Systems?28,?Montreal, Dec?7-12,?2015: 2017-2025. [10] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 7132-7141. [11] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. [12] 彭红星, 徐慧明, 刘华鼐. 融合双分支特征和注意力机制的葡萄病虫害识别模型[J]. 农业工程学报, 2022, 38(10): 156-165. PENG H X, XU H M, LIU H N. A grape pest identification model incorporating two-branch feature and attention mechanism[J]. Journal of Agricultural Engineering, 2022, 38(10): 156-165. [13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances?in?Neural?Information?Processing?Systems?30,?Long?Beach,?Dec?4-9,?2017: 5998-6008. [14] 张朝阳, 张上, 王恒涛, 等. 多尺度下遥感小目标多头注意力检测[J]. 计算机工程与应用, 2023, 59(8): 227-238. ZHANG C Y, ZHANG S, WANG H T, et al. Multi-headed attention detection of remote sensing small targets at multiple scales[J]. Computer Engineering and Applications, 2023, 59(8): 227-238. [15] 耿磊, 邱玲, 吴骏, 等. 结合深度可分离卷积与通道加权的全卷积神经网络视网膜图像血管分割[J]. 生物医学工程学杂志, 2019, 36(1): 107-115. GENG L, QIU L, WU J, et al. Full convolutional neural network combining depth-separable convolution with channel weighting for retinal image vascular segmentation[J]. Journal of Biomedical Engineering, 2019, 36(1): 107-115. [16] QIN Z, ZHANG P, WU F, et al. FcaNet: frequency channel attention networks[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Oct 11-17, 2021. Piscataway: IEEE, 2021: 763-772. [17] AHMED N, NATARAJAN T, RAO K R. Discrete cosine transform[J]. IEEE Transactions on Computers, 1974, 100(1): 90-93. [18] WANG Q,WU B,ZHU P, et al. ECA-Net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 16-18, 2020.Piscataway: IEEE, 2020: 11531-11539. [19] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 26-Jul 1, 2016.Washington: IEEE Computer Society, 2016: 2818-2826. [20] YANG Z, ZHU L, WU Y, et al. Gated channel transformation for visual recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 16-18, 2020. Piscataway: IEEE, 2020: 11794-11803. [21] HU J, SHEN L, ALBANIE S, et al. Gather-excite: exploiting feature context in convolutional neural networks[C]//Advances?in?Neural?Information?Processing?Systems?31, Montréal, Dec?3-8,?2018: 9423-9433. [22] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [23] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 2015 International Conference on Machine Learning, Lille, Jul 6-11, 2015: 448-456. [24] LI X, WANG W, HU X, et al. Selective kernel networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 510-519. [25] ZHANG K, SUN M, HAN T X, et al. Residual networks of residual networks: multilevel residual networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 28(6): 1303-1314. [26] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Image-Net classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [27] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv:1511.07122, 2015. [28] YU F, KOLTUN V, FUNKHOUSER T. Dilated residual networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 472-480. [29] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Italy, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 764-773. [30] GREGOR K, DANIHELKA I, GRAVES A, et al. DRAW: a recurrent neural network for image generation[C]//Proceedings of the 2015 International Conference on Machine Learning, Lille, Jul 6-11, 2015: 1462-1471. [31] HUANG Z, WANG X, HUANG L, et al. CCNet: criss-cross attention for semantic segmentation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 603-612. [32] WANG Y, WANG H, PENG Z. Rice diseases detection and classification using attention based neural network and Bayesian optimization[J]. Expert Systems with Applications, 2021, 178: 114770. [33] YU Y, LIU M, FENG H, et al. Split-attention multiframe alignment network for image restoration[J]. IEEE Access, 2020, 8: 39254-39272. [34] ZAGORUYKO S, KOMODAKIS N. Wide residual networks[J]. arXiv:1605.07146, 2016. [35] HAN D, KIM J, KIM J. Deep pyramidal residual networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 5927-5935. [36] IANDOLA F, MOSKEWICZ M, KARAYEV S, et al. DenseNet: implementing efficient ConvNet descriptor pyramids[J]. arXiv:1404.1869, 2014. [37] 李启行, 廖薇, 孟静雯. 基于注意力机制的双通道DAC-RNN文本分类模型[J]. 计算机工程与应用, 2022, 58(16): 157-163. LI Q H, LIAO W, MENG J W. A two-channel DAC-RNN text classification model based on attention mechanism[J]. Computer Engineering and Applications, 2022, 58(16): 157-163. [38] PARK J, WOO S, LEE J Y, et al. BAM: bottleneck attention module[J]. arXiv:1807.06514, 2018. [39] ELSKEN T, METZEN J H, HUTTER F. Neural architecture search: a survey[J]. The Journal of Machine Learning Research, 2019, 20(1): 1997-2017. [40] ZHANG Q L, YANG Y B. SA-Net: shuffle attention for deep convolutional neural networks[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Jun 6-11, 2021. Piscataway: IEEE, 2021: 2235-2239. [41] WU Y, HE K. Group normalization[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19. [42] MA N, ZHANG X, ZHENG H T, et al. ShuffleNet v2: practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 116-131. [43] SU K, YU D, XU Z, et al. Multi-person pose estimation with enhanced channel-wise and spatial information[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 5674-5682. [44] ZHANG H, ZU K, LU J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[C]//Proceedings of the 2022 Asian Conference on Computer Vision, Macau, China, Dec 4, 2022: 1161-1177. [45] LIU Y, ZHU Q, CAO F, et al. High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting[J]. ISPRS International Journal of Geo-Information, 2021, 10(4): 241. [46] YIN F, LI S, JI M, et al. Neural TV program recommendation with label and user dual attention[J]. Applied Intelligence, 2022, 52(1): 19-32. [47] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[J]. arXiv:2010.11929, 2020. [48] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Oct 11-17, 2021. Piscataway: IEEE, 2021: 10012-10022. [49] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of the 2021 International Conference on Machine Learning, Jul 18-24, 2021: 10347-10357. [50] QIAO S, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 10213-10224. [51] ZHENG S, LU J, ZHAO H, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 19-25, 2021. Piscataway: IEEE, 2021: 6881-6890. [52] TOLSTIKHIN I O, HOULSBY N, KOLESNIKOV A, et al. MLP-Mixer: an all-MLP architecture for vision[C]//Advances in?Neural?Information?Processing?Systems?34,?Dec?6-14, 2021: 24261-24272. [53] DONG X, BAO J, CHEN D, et al. CSWin transformer: a general vision transformer backbone with cross-shaped windows[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 19-24, 2022. Piscataway: IEEE, 2022: 12124-12134. [54] 刘文婷, 卢新明. 基于计算机视觉的Transformer研究进展[J]. 计算机工程与应用, 2022, 58(6): 1-16. LIU W T, LU X M. Progress of Transformer research based on computer vision[J]. Computer Engineering and Applications, 2022, 58(6): 1-16. [55] 林志玮, 金龄杰, 洪宇. 融合多尺度特征和梯度信息的云种类识别[J]. 激光与光电子学进展, 2022(18): 145-154. LIN Z W, JIN L J, HONG Y. Fusion of multi-scale features and gradient information for cloud species identification[J]. Advances in Lasers and Optoelectronics, 2022(18): 145-154. [56] 董玉民, 卫力行. 一种CNN-Transformer网络在皮肤镜图像分割上的应用[J]. 重庆师范大学学报(自然科学版), 2023(2): 126-134. DONG Y M,WEI L X. A CNN-Transformer network for dermoscopic image segmentation[J]. Journal of Chongqing Normal University (Natural Science Edition), 2023(2): 126-134. [57] SU H, JAMPANI V, SUN D, et al. Pixel-adaptive convolutional neural networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 11166-11175. [58] VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: a brief review[J]. Computational Intelligence and Neuroscience, 2018: 7068349. [59] GALASSI A, LIPPI M, TORRONI P. Attention in natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(10): 4291-4308. [60] LEI S, YI W, YING C, et al. Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 4(5): 1-14. [61] SOOD E, TANNERT S, MüLLER P, et al. Improving natural language processing tasks with human gaze-guided neural attention[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 6327-6341. [62] YANG B, WANG L, WONG D F, et al. Context-aware self-attention networks for natural language processing[J]. Neurocomputing, 2021, 458: 157-169. [63] NIU Z, ZHONG G, YU H. A review on the attention mechanism of deep learning[J]. Neurocomputing, 2021, 452: 48-62. [64] BARGH J A. Attention and automaticity in the processing of self-relevant information[J]. Journal of Personality and Social Psychology, 1982, 43(3): 425. |
[1] | 林穗, 卢超海, 姜文超, 林晓珊, 周蔚林. 融合选择注意力的小样本知识图谱补全模型[J]. 计算机科学与探索, 2024, 18(3): 646-658. |
[2] | 彭斌, 白静, 李文静, 郑虎, 马向宇. 面向图像分类的视觉Transformer研究进展[J]. 计算机科学与探索, 2024, 18(2): 320-344. |
[3] | 李锦, 夏鸿斌, 刘渊. 基于BERT的双特征融合注意力的方面情感分析模型[J]. 计算机科学与探索, 2024, 18(1): 205-216. |
[4] | 张文轩, 殷雁君, 智敏. 用于方面级情感分析的情感增强双图卷积网络[J]. 计算机科学与探索, 2024, 18(1): 217-230. |
[5] | 余文婷, 吴云. 时间感知的双塔型自注意力序列推荐模型[J]. 计算机科学与探索, 2024, 18(1): 175-188. |
[6] | 何湘杰, 宋晓宁. YOLOv4-Tiny的改进轻量级目标检测算法[J]. 计算机科学与探索, 2024, 18(1): 138-150. |
[7] | 王海勇, 潘海涛, 刘贵楠. 融合注意力机制和课程式学习的人脸识别方法[J]. 计算机科学与探索, 2023, 17(8): 1893-1903. |
[8] | 赵晓妍, 宋威. 聚集度指标引导的注意力学习粒子群优化算法[J]. 计算机科学与探索, 2023, 17(8): 1852-1866. |
[9] | 吉彦卿, 张玉金. 面向图像复制-粘贴溯源的级联双流注意力网络[J]. 计算机科学与探索, 2023, 17(8): 1981-1994. |
[10] | 马妍, 古丽米拉·克孜尔别克. 图像语义分割方法在高分辨率遥感影像解译中的研究综述[J]. 计算机科学与探索, 2023, 17(7): 1526-1548. |
[11] | 冉梦影, 杨文柱, 尹群杰. 无锚框目标检测模型通道剪枝方法[J]. 计算机科学与探索, 2023, 17(7): 1634-1643. |
[12] | 姜文涛, 张博强. 通道和异常适应性的目标跟踪算法[J]. 计算机科学与探索, 2023, 17(7): 1644-1657. |
[13] | 尹乾, 王燕, 郭平, 郑新. 二维光谱图像的神经网络校正方法[J]. 计算机科学与探索, 2023, 17(7): 1622-1633. |
[14] | 李智杰, 韩瑞瑞, 李昌华, 张颉, 石昊琦. 融合预训练模型和注意力的实体关系抽取方法[J]. 计算机科学与探索, 2023, 17(6): 1453-1462. |
[15] | 薛延明, 李光辉, 齐涛. 融合图小波和注意力机制的交通流预测方法[J]. 计算机科学与探索, 2023, 17(6): 1405-1416. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||