计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (10): 1812-1829.DOI: 10.3778/j.issn.1673-9418.2104022

• 综述·探索 • 上一篇    下一篇

卷积神经网络压缩中的知识蒸馏技术综述

孟宪法,刘方,李广,黄萌萌   

  1. 国防科技大学 自动目标识别重点实验室,长沙 410000
  • 出版日期:2021-10-01 发布日期:2021-09-30

Review of Knowledge Distillation in Convolutional Neural Network Compression

MENG Xianfa, LIU Fang, LI Guang, HUANG Mengmeng   

  1. National Key Laboratory of Science and Technology on Automatic Target Recognition, National Defense University of Science and Technology, Changsha 410000, China
  • Online:2021-10-01 Published:2021-09-30

摘要:

近年来,卷积神经网络(CNN)凭借强大的特征提取和表达能力,在图像分析领域的诸多应用中取得了令人瞩目的成就。但是,CNN性能的不断提升几乎完全得益于网络模型的越来越深和越来越大,在这个情况下,部署完整的CNN往往需要巨大的内存开销和高性能的计算单元(如GPU)支撑,而在计算资源受限的嵌入式设备以及高实时要求的移动终端上,CNN的广泛应用存在局限性。因此,CNN迫切需要网络轻量化。目前解决以上难题的网络压缩和加速途径主要有知识蒸馏、网络剪枝、参数量化、低秩分解、轻量化网络设计等。首先介绍了卷积神经网络的基本结构和发展历程,简述和对比了五种典型的网络压缩基本方法;然后重点针对知识蒸馏方法进行了详细的梳理与总结,并在CIFAR数据集上对不同方法进行了实验对比;其后介绍了知识蒸馏方法目前的评价体系,给出多类型方法的对比分析和评价;最后对该技术未来的拓展研究给出了初步的思考。

关键词: 卷积神经网络(CNN), 知识蒸馏, 神经网络压缩, 轻量化网络

Abstract:

In recent years, convolutional neural network (CNN) has made remarkable achievements in many applications in the field of image analysis with its powerful ability of feature extraction and expression. However, the continuous improvement of CNN performance is almost entirely due to the deeper and larger network model. In this case, the deployment of a complete CNN often requires huge memory overhead and high-performance computing units (such as GPU) support. However, there are limitations in the wide application of CNN in embedded devices with limited computing resources and mobile terminals with high real-time requirements. Therefore, CNN urgently needs network lightweight. At present, the main ways to solve the above problems are knowledge distillation, network pruning, parameter quantization, low rank decomposition, lightweight network design, etc. This paper first introduces the basic structure and development process of convolutional neural network, and briefly describes and compares five typical basic methods of network compression. Then, the knowledge distillation methods are combed and summarized in detail, and the different methods are compared experimentally on the CIFAR data set. Furthermore, the current evaluation system of knowledge distillation methods is introduced. The comparative analysis and evaluation of many types of methods are given. Finally, the preliminary thinking on the future development of this technology is given.

Key words: convolutional neural network (CNN), knowledge distillation, neural network compression, lightweight network