计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (1): 30-44.DOI: 10.3778/j.issn.1673-9418.2403009

• 前沿·综述 • 上一篇    下一篇

图像处理中CNN与视觉Transformer混合模型研究综述

郭佳霖,智敏,殷雁君,葛湘巍   

  1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
  • 出版日期:2025-01-01 发布日期:2024-12-31

Review of Research on CNN and Visual Transformer Hybrid Models in Image Processing

GUO Jialin, ZHI Min, YIN Yanjun, GE Xiangwei   

  1. College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
  • Online:2025-01-01 Published:2024-12-31

摘要: 卷积神经网络(CNN)与视觉Transformer是目前图像处理领域中两大重要的深度学习模型,两者经过多年来不断的研究与进步,已在该领域取得了非凡的成就。近些年来,CNN与视觉Transformer的混合模型正在逐步兴起,广泛的研究不断克服两种模型存在的弱项,高效地发挥出各自的亮点,在图像处理任务中表现出优异的效果。基于CNN与视觉Transformer混合模型进行深入阐述。总体概述了CNN与Vision Transformer模型的架构和优缺点,并总结混合模型的概念及优势。围绕串行结构融合方式、并行结构融合方式、层级交叉结构融合方式以及其他融合方式等四个方面全面回顾梳理了混合模型的研究现状和实际进展,并针对各种融合方式的主要代表模型进行总结与剖析,从多方面对典型混合模型进行评价对比。多角度叙述了混合模型在图像识别、图像分类、目标检测和图像分割等实际图像处理特定领域中应用研究,展现出混合模型在具体实践中的适用性和高效性。深入分析混合模型未来研究方向,并为后续该模型在图像处理中的研究与应用提出展望。

关键词: 卷积神经网络(CNN), 视觉Transformer, 混合模型, 图像处理, 深度学习

Abstract: Convolutional neural network (CNN) and vision Transformer are two important deep learning models in the field of image processing, and they have made remarkable achievements in this field after years of continuous research and progress. In recent years, the hybrid model of CNN and vision Transformer is gradually emerging. Extensive research has constantly overcome the weaknesses of the two models, and effectively plays their respective highlights, showing excellent results in image processing tasks. This paper is based on the hybrid model of CNN and vision Transformer. First of all, the architecture, advantages and disadvantages of CNN and vision Transformer model are summarized, and the concept and advantages of hybrid model are summarized. Secondly, this paper comprehensively reviews the research status and actual progress of hybrid models from four aspects: serial structure fusion mode, parallel structure fusion mode, hierarchical cross structure fusion mode and other fusion modes, summarizes the main representative models of various fusion modes, and compares typical hybrid models from various aspects. Then, the application research of the hybrid model in the specific fields of actual image processing such as image recognition, image classification, object detection and image segmentation is described from multiple perspectives, showing the applicability and high efficiency of the hybrid model in practice. Finally, the future research direction of hybrid model is deeply analyzed, and future research and application of this model in image processing are prospected.

Key words: convolutional neural network (CNN), visual Transformer, hybrid model, image processing, deep learning