图像处理中CNN与视觉Transformer混合模型研究综述

doi:10.3778/j.issn.1673-9418.2403009

摘要/Abstract

摘要： 卷积神经网络（Convolutional Neural Network，CNN）与视觉Transformer是目前图像处理领域中两大重要的深度学习模型，两者经过多年来不断的研究与进步，已在该领域取得了非凡的成就。近些年来，CNN与视觉Transformer的混合模型正在逐步兴起，广泛的研究不断克服两种模型存在的弱项，高效地发挥出各自的亮点，在图像处理任务中表现出了优异的效果。本文基于CNN与视觉Transformer混合模型进行深入阐述。首先，总体概述了CNN与Vision Transformer模型的架构和优缺点，并总结混合模型的概念及优势。其次，全面梳理了混合模型的分类及主要代表模型，并从多角度叙述了其在特定图像处理任务中的应用。最后，深入分析混合模型未来研究方向，并提出前瞻性展望。

关键词: CNN, 视觉Transformer, 混合模型, 图像处理, 深度学习

Abstract: Convolutional Neural Network (CNN) and vision Transformer are two important deep learning models in the field of image processing. After years of continuous research and progress, both of them have made extraordinary achievements in this field. In recent years, the hybrid model of CNN and vision Transformer is gradually rising. Extensive research has continuously overcome the weaknesses of the two models, efficiently display their respective highlights, and show excellent results in image processing tasks. This paper elaborates in depth based on the hybrid model of CNN and Vision Transformer. Firstly, the architecture, advantages and disadvantages of CNN and Vision Transformer models are generally overviewed, and the concept and advantages of hybrid models are summarized. Secondly, the classification and main representative models of hybrid models are comprehensively sorted out, and their applications in specific image processing tasks are described from multiple perspectives. Finally, the future research directions of hybrid models were deeply analyzed, and the forward-looking outlook was put forward.

Key words: CNN, visual Transformer, hybrid model, image processing, deep learning

郭佳霖, 智敏, 殷雁君, 葛湘巍. 图像处理中CNN与视觉Transformer混合模型研究综述[J]. 计算机科学与探索, DOI: 10.3778/j.issn.1673-9418.2403009.

GUO Jialin, ZHI Min, YIN Yanjun, GE Xiangwei. A Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing[J]. Journal of Frontiers of Computer Science and Technology, DOI: 10.3778/j.issn.1673-9418.2403009.

[1]	蒲秋梅, 殷帅, 李正茂, 赵丽娜. U型卷积网络在乳腺医学图像分割中的研究综述[J]. 计算机科学与探索, 2024, 18(6): 1383-1403.
[2]	江健, 张琪, 王财勇. 基于深度学习的虹膜识别研究综述[J]. 计算机科学与探索, 2024, 18(6): 1421-1437.
[3]	曾凡智, 冯文婕, 周燕. 深度学习的自然场景文本识别方法综述[J]. 计算机科学与探索, 2024, 18(5): 1160-1181.
[4]	张凯丽, 王安志, 熊娅维, 刘运. 基于Transformer的单幅图像去雾算法综述[J]. 计算机科学与探索, 2024, 18(5): 1182-1196.
[5]	于范, 张菁. 滑窗注意力多尺度均衡的密集行人检测算法[J]. 计算机科学与探索, 2024, 18(5): 1286-1300.
[6]	蓝鑫, 吴淞, 伏博毅, 秦小林. 深度学习的遥感图像旋转目标检测综述[J]. 计算机科学与探索, 2024, 18(4): 861-877.
[7]	孙水发, 汤永恒, 王奔, 董方敏, 李小龙, 蔡嘉诚, 吴义熔. 动态场景的三维重建研究综述[J]. 计算机科学与探索, 2024, 18(4): 831-860.
[8]	王恩龙, 李嘉伟, 雷佳, 周士华. 基于深度学习的红外可见光图像融合综述[J]. 计算机科学与探索, 2024, 18(4): 899-915.
[9]	曹传博, 郭春, 李显超, 申国伟. 基于AECD词嵌入的挖矿恶意软件早期检测方法[J]. 计算机科学与探索, 2024, 18(4): 1083-1093.
[10]	周燕, 李文俊, 党兆龙, 曾凡智, 叶德旺. 深度学习的三维模型识别研究综述[J]. 计算机科学与探索, 2024, 18(4): 916-929.
[11]	考文涛, 李明, 马金刚. 卷积神经网络在结直肠息肉辅助诊断中的应用综述[J]. 计算机科学与探索, 2024, 18(3): 627-645.
[12]	杨超城, 严宣辉, 陈容均, 李汉章. 融合双重注意力机制的时间序列异常检测模型[J]. 计算机科学与探索, 2024, 18(3): 740-754.
[13]	申通, 王硕, 李孟, 秦伦明. 深度学习在动物行为分析中的应用研究进展[J]. 计算机科学与探索, 2024, 18(3): 612-626.
[14]	薛金强, 吴秦. 面向图像复原和增强的轻量级交叉门控Transformer[J]. 计算机科学与探索, 2024, 18(3): 718-730.
[15]	陈加兴, 胡志伟, 李茹, 韩孝奇, 卢江, 闫智超. 融合描述信息和结构特征的知识图谱链接预测[J]. 计算机科学与探索, 2024, 18(2): 486-495.

图像处理中CNN与视觉Transformer混合模型研究综述

A Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics