Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

Review of neural network lightweight

DUAN Yuchen,  FANG Zhenyu,  ZHENG Jiangbin   

  1. School of Software, Northwestern Polytechnical University, Xian 710129, China

神经网络轻量化综述

段宇晨, 方振宇, 郑江滨   

  1. 西北工业大学 软件学院, 西安 710129

Abstract: With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; How to define knowledge in knowledge distillation; Search space, search algorithm and network performance evaluation in NAS; Post-training quantization and in-training quantization in quantization; And the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed hope to provide some ideas for researchers in this field.

Key words: pruning, quantlization, Knowledge distillation, Neural network search (NAS), Low-rank decomposition

摘要: 随着深度学习技术的不断进步,人工神经网络模型在图像识别、自然语言处理、自动驾驶等多个领域都展现出了前所未有的性能。这些模型通常具有数百万甚至数十亿个参数,通过大量的训练数据学习到复杂的特征表示。然而,在资源受限的环境下,如移动设备、嵌入式系统等边缘计算场景,模型的功耗、内存占用和计算效率等因素限制了大型神经网络模型的应用。为了解决这一问题,研究人员提出了多种模型压缩技术,例如剪枝、蒸馏、神经网络搜索(NAS)、量化、低秩分解等,旨在减少模型的参数量,计算复杂度和存储需求,同时尽可能保持模型的精准度。本文将系统介绍这些模型压缩方法的发展过程,重点分析每种方法的主要原理和关键技术。主要包括剪枝技术的不同策略,如结构化剪枝和非结构化剪枝;知识蒸馏中如何定义知识;NAS中的搜索空间,搜索算法和网络性能评估;量化中的训练后量化和训练中量化;以及低秩分解中的奇异值分解和张量分解。最后,对模型压缩技术的未来发展方向做出讨论,希望为该领域对研究者提供一些思路。

关键词: 剪枝, 量化, 知识蒸馏, 神经网络搜索(NAS), 低秩分解