计算机科学与探索

• 学术研究 •    下一篇

图像处理中CNN与视觉Transformer混合模型研究综述

郭佳霖, 智敏,殷雁君,葛湘巍   

  1. 内蒙古师范大学 计算机科学技术学院, 呼和浩特 010022

A Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing

GUO Jialin, ZHI Min, YIN Yanjun, GE Xiangwei   

  1. College of Computer Science and Technology, Inner Mongolia University, Hohhot 010022, China

摘要: 卷积神经网络(Convolutional Neural Network,CNN)与视觉Transformer是目前图像处理领域中两大重要的深度学习模型,两者经过多年来不断的研究与进步,已在该领域取得了非凡的成就。近些年来,CNN与视觉Transformer的混合模型正在逐步兴起,广泛的研究不断克服两种模型存在的弱项,高效地发挥出各自的亮点,在图像处理任务中表现出了优异的效果。本文基于CNN与视觉Transformer混合模型进行深入阐述。首先,总体概述了CNN与Vision Transformer模型的架构和优缺点,并总结混合模型的概念及优势。其次,全面梳理了混合模型的分类及主要代表模型,并从多角度叙述了其在特定图像处理任务中的应用。最后,深入分析混合模型未来研究方向,并提出前瞻性展望。

关键词: CNN, 视觉Transformer, 混合模型, 图像处理, 深度学习

Abstract: Convolutional Neural Network (CNN) and vision Transformer are two important deep learning models in the field of image processing. After years of continuous research and progress, both of them have made extraordinary achievements in this field. In recent years, the hybrid model of CNN and vision Transformer is gradually rising. Extensive research has continuously overcome the weaknesses of the two models, efficiently display their respective highlights, and show excellent results in image processing tasks. This paper elaborates in depth based on the hybrid model of CNN and Vision Transformer. Firstly, the architecture, advantages and disadvantages of CNN and Vision Transformer models are generally overviewed, and the concept and advantages of hybrid models are summarized. Secondly, the classification and main representative models of hybrid models are comprehensively sorted out, and their applications in specific image processing tasks are described from multiple perspectives. Finally, the future research directions of hybrid models were deeply analyzed, and the forward-looking outlook was put forward.

Key words: CNN, visual Transformer, hybrid model, image processing, deep learning