深度神经网络模型压缩综述

doi:10.3778/j.issn.1673-9418.2003056

摘要/Abstract

摘要：

近年来，随着深度学习的飞速发展，深度神经网络受到了越来越多的关注，在许多应用领域取得了显著效果。通常，在较高的计算量下，深度神经网络的学习能力随着网络层深度的增加而不断提高，因此深度神经网络在大型数据集上的表现非常卓越。然而，由于其计算量大、存储成本高、模型复杂等特性，使得深度学习无法有效地应用于轻量级移动便携设备。因此，压缩、优化深度学习模型成为目前研究的热点。当前主要的模型压缩方法有模型裁剪、轻量级网络设计、知识蒸馏、量化、体系结构搜索等。对以上方法的性能、优缺点和最新研究成果进行了分析总结，并对未来研究方向进行了展望。

关键词: 深度学习, 模型压缩, 神经网络

Abstract:

In recent years, the deep neural networks have gained more and more attention with the rapid development of deep learning. It has achieved remarkable effect in many application fields. Usually, at a higher computation, the learning ability of deep neural networks is improved with the increase of depth, which makes the performance of deep learning on large datasets especially successful. However, the deep learning can??t be effectively applied to the lightweight mobile portable device due to the characteristics of large amount of calculation, high storage cost and complicated model. Therefore, compressing and simplifying the deep learning model has become the research hot spot. Currently, the main model compression methods include pruning, lightweight network design, knowledge distillation, quantization, neural architecture search, etc. This paper analyses and summarizes the performance, advantages and limitations and the latest research results of the model compression methods, and prospects the future research direction.

Key words: deep learning, model compression, neural networks

耿丽丽，牛保宁. 深度神经网络模型压缩综述[J]. 计算机科学与探索, 2020, 14(9): 1441-1455.

GENG Lili, NIU Baoning. Survey of Deep Neural Networks Model Compression[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1441-1455.

参考文献

[1] LeCun Y, Denker J S, Solla S A. Optimalbrain damage[J]. Advances in Neural Information Processing Systems, 1990, 2: 598-605.
[2] Han S, Mao H, Dally W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding[J]. Fiber, 2015, 56(4): 3-7.
[3] Zhang C, Tian J, Wang Y S, et al. Survey of model compression method for neural networks[J]. Computer Science, 2018, 45(10): 1-5.张弛, 田锦, 王永森, 等. 神经网络模型压缩方法综述[J]. 计算机科学, 2018, 45(10): 1-5.
[4] Cao W L, Rui J W, Li M. Survey of neural network model compression methods[J]. Application Research of Computers, 2018, 36(3): 649-656.曹文龙, 芮建武, 李敏. 神经网络模型压缩方法综述[J]. 计算机应用研究, 2018, 36(3): 649-656.
[5] Luo J, Wu J. An entropy-based pruning method for CNN compression[J]. arXiv:1706.05791, 2017.
[6] Yang T, Chen Y, Sze V. Designing energy-efficient con-volutional neural networks using energy-aware pruning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6071-6079.
[7] Hu Y, Sun S, Li J, et al. A novel channel pruning method for deep neural network compression[J]. arXiv:1805.1139419, 2018.
[8] He Y H，Zhang X Y, Sun J. Channel pruning for acce-lerating very deep neural networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 1389-1397.
[9] Anwar S, Sung W Y. Coarse pruning of convolutional neural networks with random masks[C]//Proceedings of the 2017 International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 134-145.
[10] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient ConvNets[J]. arXiv:1608.08710, 2016.
[11] Pavlo M, Stephen T, Tero K, et al. Pruning convolutional neural networks for resource efficient inference[J]. arXiv:1611.06440, 2016.
[12] Hu H, Peng R, Tai Y W, et al. Network trimming: a data-driven neuron pruning approach towards efficient deep architectures[J]. arXiv:1611.05128, 2016.
[13] Deepak M, Shweta B, Mitesh M, et al. Recovering from random pruning: on the plasticity of deep convolutional neural networks[J]. arXiv:1801.10447, 2018.
[14] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Washington: IEEE Computer Society, 2019: 4340-4349.
[15] Yu R, Li A, Chen C F, et al. NISP: pruning networks using neuron importance score propagation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington:IEEE Computer Society, 2018: 9194-9203.
[16] Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient DNNs[J]. arXiv:1608.04493, 2016.
[17] Tian Q, Tal A, James J, et al. Deep LDA-pruned nets for efficient facial gender classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 512-521.
[18] Cheng Y, Yu F X, Feris R S, et al. An exploration of parameter redundancy in deep networks with circulant projections[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2015: 2857-2865.
[19] Hsiao T Y, Chang Y C, Chou H, et al. Filter-based deep-compression with global average pooling for convolutional networks[J]. Journal of Systems Architecture, 2019, 95: 9-18.
[20] Lin S, Ji R, Yan C, et al. Towards optimal structured CNN pruning via generative adversarial learning[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Washington: IEEE Computer Society, 2019: 2790-2799.
[21] Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks[J]. ACM Journal on Emerging Technologies in Computing Systems, 2017, 13(3): 32.
[22] Jin L L, Yang W Z, Wang S L, et al. Mixed pruning method for convolutional neural network compression[J]. Journal of Chinese Computer Systems, 2018, 39(12): 2596-2601.靳丽蕾, 杨文柱, 王思乐, 等. 一种用于卷积神经网络压缩的混合剪枝方法[J]. 小型微型计算机系统, 2018, 39(12): 2596-2601.
[23] Huang C, Chang T, Tan H, et al. Neural network pruning based on weight similarity[J]. Journal of Frontiers of Computer Science and Technology,?2018,?12(8):?1278-1285.黄聪, 常滔, 谭虎, 等. 基于权值相似性的神经网络剪枝[J]. 计算机科学与探索,2018,12(8):1278-1285.
[24] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14(7): 38-39.
[25] Sergey Z, Nikos K. Paying more attention to attention: improving the performance of convolution neural networks via attention transfer[J]. arXiv:1612.03928, 2016.
[26] Wen W, Wu C, Wang Y, et al. Learning structured sparsity in deep neural networks[J]. arXiv:1608.03665, 2016.
[27] Du Y H, Jae Y N, Byoung C K. Estimation of pedestrian pose orientation using soft target training based on teacher-student framework[J]. Sensors, 2019, 19(5): 1147.
[28] Min R, Hai L, Zong J, et al. A gradually distilled CNN for SAR target, recognition[J]. IEEE Access, 2019, 7: 42190-42200.
[29] Xu Z, Song Z Q. Convolution neural network compression method with scale factor[J]. Computer Engineering and Applications, 2018, 54(12): 105-109.徐喆, 宋泽奇. 带比例因子的卷积神经网络压缩方法[J].计算机工程与应用,2018,54(12):105-109.
[30] Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: Im-ageNet classification using binary convolutional neural networks[J]. arXiv:1603.05279, 2016.
[31] Lin X, Zhao C, Pan W. Towards accurate binary convolutional neural network[J]. arXiv:1711.11294, 2017.
[32] Li Z, Ni B, Zhang W, et al. Performance guaranteed network acceleration via high-order residual quantization[J]. arXiv:1708.08687, 2017.
[33] Liu Z, Wu B, Luo W, et al. Bi-Real Net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm[J]. arXiv:1808. 00278, 2018.
[34] Courbariaux M, Bengio Y, David J P. BinaryConnect:training deep neural networks with binary weights during propagations[J]. arXiv:1511.00363, 2015.
[35] Zhu C, Han S, Mao H, et al. Trained ternary quantization[J]. arXiv:1612.01064, 2016.
[36] Xu Y, Dong X, Li Y, et al. A main/subsidiary network framework for simplifying binary neural networks[J]. arXiv: 1812.04210, 2018.
[37] Li F, Zhang B, Liu B. Ternary weight networks[J]. arXiv:1605.04711, 2016.
[38] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for eifficient integer-arithmetic-only infer-ence[J]. arXiv:1712.05877, 2017.
[39] Sangil J, Son C Y, Seohyung L, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[J]. arXiv:1808.05779, 2018.
[40] Dong Y, Ni R, Li J, et al. Learning accurate low-bit deep neural networks with stochastic quantization[J]. arXiv:1708. 01001, 2017.
[41] Zhou A J, Yao A B, Guo Y W, et al. Incremental network quantization: towards lossless CNNS with low-precision wei-ghts[J]. arXiv:1702.03044, 2017.
[42] Wang Y, Xu C, You S, et al. CNNpack: packing convolutional neural networks in the frequency domain[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2019, 41: 2495-2510.
[43] Qin Z D, Zhu D, Zhu X W, et al. Accelerating deep neural networks by combining block-circulant matrices and low-precision weights[J]. Electronics, 2019, 8(1): 78.
[44] Seo S, Kim J. Efficient weights quantization of convolutional neural networks using kernel density estimation based non-uniform quantizer[J]. Applied Sciences, 2019, 9(12): 2559.
[45] Tan W R, Chan C S, Aguirre H E, et al. Fuzzy qualitative deep compression network[J]. Neurocomputing, 2017, 251:1-15.
[46] Lin J, Gan C, Han S. Defensive quantization: when effi-ciency meets robustness[J]. arXiv:1904.08444, 2019.
[47] Li Y, Lin S, Zhang B, et al. Exploiting kernel sparsity and entropy for interpretable CNN compression[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Washington:IEEE Computer Society, 2019: 2800-2809.
[48] Wang K, Liu Z, Lin Y, et al. HAQ: hardware-aware automated quantization[J]. arXiv:1811.08886, 2018.
[49] Howard A G, Zhu M, Chen B, et al. MobileNets: efficient convolutional neural networks for mobile vision applic-ations[J]. arXiv:1704.04861, 2017.
[50] Sandler M, Haward A, Zhu M L, et al. MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington:IEEE Computer Society, 2018: 4510-4520.
[51] Zhang X Y, Zhou X Y, Lin M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 6848-6856.
[52] Ma N N, Zhang X Y, Zheng H T, et al. ShuffleNet-V2: prac-tical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 122-138.
[53] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size[J]. arXiv:1602.07360, 2016.
[54] Chollet F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1251-1258.
[55] Mehta S, Rastegari M, Shapiro L, et al. ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Washington: IEEE Computer Society, 2019: 9190-9200.
[56] Li X, Long R, Yan J, et al. TANet: a tiny plankton class-ification network for mobile devices[J]. Mobile Information Systems, 2019(4): 1-8.
[57] Jin X, Yuan X, Feng J, et al. Training skinny deep neural networks with iterative hard thresholding methods[J]. arXiv: 1607.05423, 2016.
[58] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[J]. arXiv:1611.05431, 2016.
[59] Yu Y, Huang J, Du W, et al. Design and analysis of a lightweight context fusion CNN scheme for crowd counting[J]. Sensors, 2019, 19(9): 2013.
[60] Zhu G, Wang J, Wang P, et al. Feature distilled tracking[J].IEEE Transactions on Cybernetics, 2019, 49(2): 440-452.
[61] Elsken T, Metzen J H, Hutter F. Neural architecture search:a survey[J]. arXiv:1808.05377, 2018.
[62] Barret Z, Quoc V L. Neural architecture search with rein-forcement learning[J]. arXiv:1611.01578, 2016.
[63] Fan S, Yu H, Lu D, et al. CSCC: convolution split com-pression calculation algorithm for deep neural network[J]. IEEE Access, 2019, 7: 71607-71615.
[64] Liu H, Simonyan K, Yang Y. Darts: differentiable arch-itecture search[J]. arXiv:1806.09055, 2018.
[65] Frankle J, Carbin M. The lottery ticket hypothesis: finding sparse, trainable neural networks[J]. arXiv:1803.03635, 2018.
[66] Tan M, Chen B, Pang R, et al. MnasNet: platform-aware neural architecture search for mobile[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Washington: IEEE Computer Society, 2019: 2820-2828.
[67] Cai H, Zhu L G, Han S. Proxyless NAS: direct neural architecture search on target task and hardware[J]. arXiv:1812.00332, 2018.
[68] Bergomi M G, Frosini P, Giorgi D, et al. Towards a topolo-gical-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning[J]. Nature Machine Intelligence, 2019,?1: 423-433.
[69] Prost-Boucle A, Bourge A, Petrot F. High-efficiency con-volutional ternary neural networks with custom adder trees and weight compression[J]. ACM Transactions on Recon-figurable Technology & Systems, 2018, 11(3): 1-24.
[70] Tran D T, Alexandros I, Moncef G. Improving efficiency in convolutional neural networks with multilinear filters[J].Neural Networks, 2018, 105: 328-339.
[71] Tan M, Le Q V. EfficientNet: rethinking model scaling for convolutional neural networks[J]. arXiv:1905.11946, 2019.
[72] Zhu S L, Dong X, Su H. Binary ensemble neural network: more bits per network or more networks perbit?[J]. arXiv:1806.07550, 2018.
[73] Wang X, Kan M, Shan S, et al. Fully learnable group convolution for acceleration of deep neural networks[J]. arXiv: 1904.00346, 2019.
[74] Pham H, Guan M Y, Zoph B, et al. Efficient neural arch-itecture search via parameter sharing[J]. arXiv:1802.03268, 2018.
[75] Baker B, Gupta O, Naik N, et al. Designing neural network architectures using reinforcement learning[J]. arXiv:1611. 02167, 2016.
[76] Zhong Z, Yan J, Wu W, et al. Practical block-wise neural network architecture generation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 2423-2432.
[77] Zhong Z, Yan J, Liu C L. Practical network blocks design with Q-learning[J]. arXiv:1708.05552, 2017.
[78] Luo R, Tian F, Qin T, et al. Neural architecture optimization[J]. arXiv:1808.07233, 2018.
[79] Liu C, Zoph B, Neumann M, et al. Progressive neural architecture search[J]. arXiv:1712.00559, 2017.
[80] Kirthevasan K, Willie N, Jeff S, et al. Neural architecture search with Bayesian optimisation and optimal transport[J]. arXiv:1802.07191, 2018.
[81] Zela A, Klein A, Falkner S, et al. Towards automated deep learning: efficient joint neural architecture and hyperpa-rameter search[J]. arXiv:1807.06906, 2018.
[82] Hsu C H, Chang S H, Liang J H, et al. MONAS: multi-objective neural architecture search using reinforcement learning[J]. arXiv:1806.10332, 2018.
[83] Dong J D, Cheng A C, Juan D C, et al. DPP-Net: device-aware progressive search for pareto-optimal neural architec-tures[J]. arXiv:1806.08198, 2018.
[84] Cheng Y, Wang D, Zhou P, et al. Model compression and acceleration for deep neural networks: the principles, progress, and challenges[J]. IEEE Signal Processing Magazine, 2018, 35(1): 126-136.