Review of Neural Network Lightweight

doi:10.3778/j.issn.1673-9418.2403071

Abstract

Abstract: With the continuous progress of deep learning technology, artificial neural network models have shown unprecedented performance in many fields such as image recognition, natural language processing, and autonomous driving. These models often have millions or even billions of parameters and learn complex feature representations through large amounts of training data. However, in resource-constrained environments, such as mobile devices, embedded systems and other edge computing scenarios, the power consumption, memory usage and computing efficiency of the model limit the application of large-scale neural network models. To solve this problem, the researchers have proposed a variety of model compression techniques, such as pruning, distillation, neural network search (NAS), quantization, and low-rank decomposition, which aim to reduce the number of parameters, computational complexity, and storage requirements of the model, while maintaining the accuracy of the model as much as possible. The following is a systematic introduction to the development process of these model compression methods, focusing on the main principles and key technologies of each method. It mainly includes different strategies of pruning techniques, such as structured pruning and unstructured pruning; how to define knowledge in knowledge distillation; search space, search algorithm and network performance evaluation in NAS; post-training quantization and in-training quantization in quantization; and the singular value decomposition and tensor decomposition in low rank decomposition. Finally, the future development direction of model compression technology is discussed.

Key words: pruning, quantization, knowledge distillation, neural network search (NAS), low-rank decomposition

摘要： 随着深度学习技术的不断进步，人工神经网络模型在图像识别、自然语言处理、自动驾驶等多个领域都展现出了前所未有的性能。这些模型通常具有数百万甚至数十亿个参数，通过大量的训练数据学习到复杂的特征表示。然而，在资源受限的环境下，如移动设备、嵌入式系统等边缘计算场景，模型的功耗、内存占用和计算效率等因素限制了大型神经网络模型的应用。为了解决该问题，研究人员提出了多种模型压缩技术，例如剪枝、蒸馏、神经网络搜索（NAS）、量化、低秩分解等，旨在减少模型的参数量、计算复杂度和存储需求，同时尽可能保持模型的精准度。系统介绍了这些模型压缩方法的发展过程，重点分析每种方法的主要原理和关键技术。主要包括剪枝技术的不同策略，如结构化剪枝和非结构化剪枝；知识蒸馏中如何定义知识；NAS中的搜索空间，搜索算法和网络性能评估；量化中的训练后量化和训练中量化；以及低秩分解中的奇异值分解和张量分解。对模型压缩技术的未来发展方向做出讨论。

关键词: 剪枝, 量化, 知识蒸馏, 神经网络搜索（NAS）, 低秩分解

DUAN Yuchen, FANG Zhenyu, ZHENG Jiangbin. Review of Neural Network Lightweight[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(4): 835-853.

段宇晨, 方振宇, 郑江滨. 神经网络轻量化综述[J]. 计算机科学与探索, 2025, 19(4): 835-853.

References

[1] RADFORD A, NARASIMHAN K. Improving language understanding by generative pre-training[EB/OL]. [2023-11-06]. https://cdn.openai.com/research-covers/language-unsupervised/ language_understanding_paper.pdf.
[2] YANG T J, HOWARD A, CHEN B, et al. NetAdapt: platform-aware neural network adaptation for mobile applications[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 289-304.
[3] LIANG T L, GLOSSNER J, WANG L, et al. Pruning and quantization for deep neural network acceleration: a survey[J]. Neurocomputing, 2021, 461: 370-403.
[4] BHALGAONKAR S A, MUNOT M V, ANUSE A D. Pruning for compression of visual pattern recognition networks: a survey from deep neural networks perspective[M]//Pattern recognition and data analysis with applications. Singapore: Springer, 2022: 675-687.
[5] ZHENG Q H, SAPONARA S, TIAN X Y, et al. A real-time constellation image classification method of wireless communication signals based on the lightweight network Mobile-ViT[J]. Cognitive Neurodynamics, 2024, 18(2): 659-671.
[6] RENDA A, FRANKLE J, CARBIN M, et al. Comparing rewinding and fine-tuning in neural network pruning[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2003.02389.
[7] GEN? E H, FRAENZ C, SCHLüTER C, et al. Diffusion markers of dendritic density and arborization in gray matter predict differences in intelligence[J]. Nature Communications, 2018, 9(1): 1905.
[8] LECUN Y, DENKER J S, SOLLA S A. Optimal brain damage[C]//Advances in Neural Information Processing Systems, 1990: 598-605.
[9] HASSIBI B, STORK D G. Second order derivatives for network pruning: optimal brain surgeon[C]//Advances in Neural Information Processing Systems 5, 1992: 164-171.
[10] MOLCHANOV P, MALLYA A, TYREE S, et al. Importance estimation for neural network pruning[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 11256-11264.
[11] GUO Y W, YAO A B, CHEN Y R, et al. Dynamic network surgery for efficient DNNs[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 1387-1395.
[12] LIU H, LI Z Y, HALL D, et al. Sophia: a scalable stochastic second-order optimizer for language model pre-training[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2305.14342.
[13] LI H, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient ConvNets[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1608.08710.
[14] LUO J H, WU J X, LIN W Y. ThiNet: a filter level pruning method for deep neural network compression[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5068-5076.
[15] WANG Z, LI C C, WANG X Y. Convolutional neural network pruning with structural redundancy reduction[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14913-14922.
[16] FANG G F, MA X Y, SONG M L, et al. DepGraph: towards any structural pruning[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 16091-16101.
[17] FRANKLE J, CARBIN M. The lottery ticket hypothesis: finding sparse, trainable neural networks[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1803.03635.
[18] LIU Z, SUN M J, ZHOU T H, et al. Rethinking the value of network pruning[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1810.05270.
[19] GUO S P, WANG Y J, LI Q Q, et al. DMCP: differentiable Markov channel pruning for neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1536-1544.
[20] LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2755-2763.
[21] FANG G, MA X, WANG X.Structural pruning for diffusion models[EB/OL]. [2023-11-06]. http://arxiv.org/abs/2305.10924.
[22] HE Y, XIAO L G. Structured pruning for deep convolutional neural networks: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 2900-2919.
[23] XIA M Z, ZHONG Z X, CHEN D Q. Structured pruning learns compact and accurate models[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2204.00408.
[24] BLALOCK D, ORTIZ J J G, FRANKLE J, et al. What is the state of neural network pruning?[EB/OL]. [2023-11-06]. http://arxiv.org/abs/2003.03033.
[25] LI B L, WU B W, SU J, et al. EagleEye: fast sub-net evaluation for efficient neural network pruning[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2007.02491.
[26] LI Y W, ADAMCZEWSKI K, LI W, et al. Revisiting random channel pruning for neural network compression[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 191-201.
[27] BAKER B, GUPTA O, RASKAR R, et al. Accelerating neural architecture search using performance prediction[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1705.10823.
[28] LUO R Q, TAN X, WANG R, et al. Accuracy prediction with non-neural model for neural architecture search[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2007.04785.
[29] LI X, ZHOU Y M, PAN Z, et al. Partial order pruning: for best speed/accuracy trade-off in neural architecture search[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9137-9145.
[30] KANG H J. Accelerator-aware pruning for convolutional neural networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(7): 2093-2103.
[31] LIU N, MA X L, XU Z Y, et al. AutoCompress: an automatic DNN structured pruning framework for ultra-high compression rates[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4876-4883.
[32] MENG F, CHENG H, LI K, et al. Pruning filter in filter[C]//Advances in Neural Information Processing Systems 33, 2020: 17629-17640.
[33] CHEN T, ZHANG H, ZHANG Z, et al. Linearity grafting: relaxed neuron pruning helps certifiable robustness[C]//Proceedings of the 39th International Conference on Machine Learning, 2022: 3760-3772.
[34] MYUNG S, HUH I, JANG W, et al. PAC-Net: a model pruning approach to inductive transfer learning[C]//Proceedings of the 39th International Conference on Machine Learning, 2022: 16240-16252.
[35] YU L, XIANG W. X-pruner: explainable pruning for vision transformers[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 24355-24363.
[36] BUCILUǎ C, CARUANA R, NICULESCU-MIZIL A. Model compression[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 535-541.
[37] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1503.02531.
[38] ASHOK A, RHINEHART N, BEAINY F, et al. N2N learning: network to network compression via policy gradient reinforcement learning[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1709.06030.
[39] CHEN D F, MEI J P, ZHANG H L, et al. Knowledge distillation with the reused teacher classifier[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11923-11932.
[40] ZHAO B R, CUI Q, SONG R J, et al. Decoupled knowledge distillation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11943-11952.
[41] SHU C, LIU Y, GAO J, et al. Channel-wise knowledge distillation for dense prediction[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE , 2021: 5311-5320.
[42] YANG C G, ZHOU H L, AN Z L, et al. Cross-image relational knowledge distillation for semantic segmentation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 12309-12318.
[43] PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3962-3971.
[44] ANIL R, PEREYRA G, PASSOS A, et al. Large scale distributed neural network training through online distillation[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1804.03235.
[45] GUO Q S, WANG X J, WU Y C, et al. Online knowledge distillation via collaborative learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11017-11026.
[46] CHEN D F, MEI J P, WANG C, et al. Online knowledge distillation with diverse peers[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 3430-3437.
[47] GE Y X, ZHANG X, CHOI C L, et al. Self-distillation with batch knowledge ensembling improves ImageNet classification[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2104.13298.
[48] CHO J H, HARIHARAN B. On the efficacy of knowledge distillation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 4793-4801.
[49] LIU Y, JIA X, TAN M, et al. Search to distill: pearls are everywhere but not the eyes[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7539-7548.
[50] BEYER L, ZHAI X H, ROYER A, et al. Knowledge distillation: a good teacher is patient and consistent[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10915- 10924.
[51] HAN Y Z, HUANG G, SONG S J, et al. Dynamic neural networks: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(11): 7436-7456.
[52] ZAIDI S, ZELA A, ELSKEN T, et al. Neural ensemble search for uncertainty estimation and dataset shift[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2006.08573.
[53] WEN W, LIU H X, CHEN Y R, et al. Neural predictor for neural architecture search[C]//Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 660-676.
[54] CHEN R Q, LUO J J, NIAN F, et al. SSHNN: semi-supervised hybrid NAS network for echocardiographic image segmentation[C]//Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2024: 1541-1545.
[55] BAKER B, GUPTA O, NAIK N, et al. Designing neural network architectures using reinforcement learning[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1611.02167.
[56] ZOPH B, LE Q V, MATHUR V, et al. Neural architecture search with reinforcement learning[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1611.01578.
[57] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8697-8710.
[58] PHAM H, GUAN M Y, ZOPH B, et al. Efficient neural architecture search via parameter sharing[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 4095-4104.
[59] LIU H X, SIMONYAN K, YANG Y M, et al. DARTS: differentiable architecture search[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1806.09055.
[60] TAN M X, CHEN B, PANG R M, et al. MnasNet: platform-aware neural architecture search for mobile[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2815-2823.
[61] RADOSAVOVIC I, KOSARAJU R P, GIRSHICK R, et al. Designing network design spaces[C]//Proceedings of the 2020 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10425-10433.
[62] WANG L N, XIE S N, LI T, et al. Sample-efficient neural architecture search by learning action space[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1906.06832.
[63] CHEN B Y, LI P X, LI C M, et al. GLiT: neural architecture search for global and local image transformer[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 12-21.
[64] ISOBE T, JIA X, CHEN S J, et al. Multi-target domain adaptation with collaborative consistency learning[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8183-8192.
[65] LI L J, JIN Z. Shadow knowledge distillation: bridging off-line and online knowledge transfer[C]//Advances in Neural Information Processing Systems 35, 2022: 635-649.
[66] ZHANG L F, BAO C L, MA K S. Self-distillation: towards efficient and compact neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(8): 4388-4403.
[67] XIE L X, CHEN X, BI K F, et al. Weight-sharing neural architecture search: a battle to shrink the optimization gap[J]. ACM Computing Surveys, 2021, 54(9): 1-37.
[68] ZHANG J X, CHEN X Y, WEI H K, et al. A lightweight network for photovoltaic cell defect detection in electroluminescence images based on neural architecture search and knowledge distillation[J]. Applied Energy, 2024, 355: 122184.
[69] PRIYADARSHI S, JIANG T, CHENG H P, et al. DONNAv2-lightweight neural architecture search for vision tasks[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 1384-1392.
[70] KANG J S, KANG J, KIM J J, et al. Neural architecture search survey: a computer vision perspective[J]. Sensors, 2023, 23(3): 1713.
[71] CHITTY-VENKATA K T, SOMANI A K. Neural architecture search survey: a hardware perspective[J]. ACM Computing Surveys, 2022, 55(4): 1-36.
[72] MELLOR J, TURNER J, STORKEY A, et al. Neural architecture search without training[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 7588-7598.
[73] LU Z C, SREEKUMAR G, GOODMAN E, et al. Neural architecture transfer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(9): 2971-2989.
[74] LYU B, YUAN H, LU L F, et al. Resource-constrained neural architecture search on edge devices[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(1): 134-142.
[75] XIE S R, ZHENG H H, LIU C X, et al. SNAS: stochastic neural architecture search[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1812.09926.
[76] ZHONG Z, YAN J, WU W, et al. Practical block-wise neural network architecture generation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2423-2432.
[77] LIU H X, SIMONYAN K, VINYALS O, et al. Hierarchical representations for efficient architecture search[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1711.00436.
[78] RADOSAVOVIC I, JOHNSON J, XIE S N, et al. On network design spaces for visual recognition[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1882-1890.
[79] WANG L N, FONSECA R, TIAN Y D, et al. Learning search space partition for black-box optimization using Monte Carlo tree search[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 19511-19522.
[80] CI Y Z, LIN C, SUN M, et al. Evolving search space for neural architecture search[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 6639-6649.
[81] CHITTY-VENKATA K T, EMANI M, VISHWANATH V, et al. Neural architecture search benchmarks: insights and survey[J]. IEEE Access, 2023, 11: 25217-25236.
[82] YING C, KLEIN A, CHRISTIANSEN E, et al. NAS-Bench-101: towards reproducible neural architecture search[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 7105-7114.
[83] DONG X, YANG Y. Nas-bench-201: extending the scope of reproducible neural architecture search[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2001.00326.
[84] LIASHCHYNSKYI P, LIASHCHYNSKYI P. Grid search, random search, genetic algorithm: a big comparison for NAS[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1912.06059.
[85] ARICAN M E, KARA O, BREDELL G, et al. ISNAS-DIP: image-specific neural architecture search for deep image prior[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 1950-1958.
[86] WEI C, NIU C, TANG Y P, et al. NPENAS: neural predictor guided evolution for neural architecture search[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8441-8455.
[87] REAL E, AGGARWAL A, HUANG Y P, et al. Regularized evolution for image classifier architecture search[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI, 2019: 4780-4789.
[88] FANG Z Y, REN J C, MARSHALL S, et al. Topological optimization of the DenseNet with pretrained-weights inheritance and genetic channel selection[J]. Pattern Recognition, 2021, 109: 107608.
[89] REAL E, MOORE S, SELLE A, et al. Large-scale evolution of image classifiers[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017: 2902-2911.
[90] SUN Y N, XUE B, ZHANG M J, et al. Automatically designing CNN architectures using the genetic algorithm for image classification[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3840-3854.
[91] RAMACHANDRAN P, ZOPH B, LE Q V. Searching for activation functions[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1710.05941.
[92] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[93] ANDRYCHOWICZ M, DENIL M, COLMENAREJO S G, et al. Learning to learn by gradient descent by gradient descent[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016: 3988-3996.
[94] XU Y H, XIE L X, ZHANG X P, et al. PC-DARTS: partial channel connections for memory-efficient architecture search[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1907.05737.
[95] CHU X X, WANG X X, ZHANG B, et al. DARTS-: robustly stepping out of performance collapse without indicators[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2009.01027.
[96] LUO R, TIAN F, QIN T, et al. Neural architecture optimization[C]//Advances in Neural Information Processing Systems 31, 2018.
[97] WU B C, KEUTZER K, DAI X L, et al. FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 10726-10734.
[98] GAO Y, YANG H, ZHANG P, et al. Graph neural architecture search[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 1403-1409.
[99] CHEN X, XIE L, WU J, et al. Progressive differentiable architecture search: bridging the depth gap between search and evaluation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1294-1303.
[100] PENG H W, DU H, YU H Y, et al. Cream of the crop: distilling prioritized paths for one-shot neural architecture search[C]//Advances in Neural Information Processing Systems 33, 2020: 17955-17964.
[101] YANG H, ZHANG Y S, YIN C B, et al. Ultra-lightweight CNN design based on neural architecture search and knowledge distillation: a novel method to build the automatic recognition model of space target ISAR images[J]. Defence Technology, 2022, 18(6): 1073-1095.
[102] WANG X B. Teacher guided neural architecture search for face recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(4): 2817-2825.
[103] BOUTROS F, SIEBKE P, KLEMT M, et al. PocketNet: extreme lightweight face recognition network using neural architecture search and multistep knowledge distillation[J]. IEEE Access, 2022, 10: 46823-46833.
[104] LIU D C, YAMASAKI T, WANG Y, et al. Toward extremely lightweight distracted driver recognition with distillation-based neural architecture search and knowledge transfer[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(1): 764-777.
[105] TROFIMOV I, KLYUCHNIKOV N, SALNIKOV M, et al. Multi-fidelity neural architecture search with knowledge distillation[J]. IEEE Access, 2023, 11: 59217-59225.
[106] XIE P T, DU X F. Performance-aware mutual knowledge distillation for improving neural architecture search[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11912-11922.
[107] KANG M, MUN J, HAN B. Towards oracle knowledge distillation with neural architecture search[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4404-4411.
[108] DOMHAN T, SPRINGENBERG J T, HUTTER F, et al. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves[C]//Proceedings of the 24th International Conference on Artificial Intelligence. Palo Alto: AAAI, 2015: 3460-3468.
[109] LIU C X, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 19-35.
[110] GHOLAMI A, KIM S, DONG Z, et al. A survey of quantization methods for efficient neural network inference[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2103.13630.
[111] GRAY R M, NEUHOFF D L. Quantization[J]. IEEE Transactions on Information Theory, 1998, 44(6): 2325-2383.
[112] RIEMANN B. Ueber die Darstellbarkeit einer Function durch eine trigonometrische Reihe[M]. G?ttingen: Dieterichschen Buchhandlung, 1867.
[113] JEON Y, LEE C, KIM H Y. Genie: show me the data for quantization[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12064-12073.
[114] SHIN J, SO J, PARK S, et al. NIPQ: noise proxy-based integrated pseudo-quantization[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 3852-3861.
[115] ZHANG J H, ZHAN F N, THEOBALT C, et al. Regularized vector quantization for tokenized image synthesis[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18467-18476.
[116] LIU J, NIU L, YUAN Z, et al. PD-Quant: post-training quantization based on prediction difference metric[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 24427-24437.
[117] NAGEL M, AMJAD R A, VAN BAALEN M, et al. Up or down? adaptive rounding for post-training quantization[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 7197-7206.
[118] TU Z J, HU J, CHEN H T, et al. Toward accurate post-training quantization for image super resolution[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 5856-5865.
[119] ZHANG X, WU X L. LVQAC: lattice vector quantization coupled with spatially adaptive companding for efficient learned image compression[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10239-10248.
[120] MA Y, LI H, ZHENG X, et al. Solving oscillation problem in post-training quantization through a theoretical perspective[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7950-7959.
[121] LIU Y J, YANG H R, DONG Z, et al. NoisyQuant: noisy bias-enhanced post-training activation quantization for vision transformers[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 20321-20330.
[122] LIU Z C, CHENG K T, HUANG D, et al. Nonuniform-to-uniform quantization: towards accurate quantization via generalized straight-through estimation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4942-4952.
[123] ESSER S K, MCKINSTRY J L, BABLANI D, et al. Learned step size quantization[EB/OL]. [2023-11-06]. https://arxiv.org/abs/1902.08153.
[124] HUANG X, SHEN Z, LI S, et al. SDQ: stochastic differentiable quantization with mixed precision[C]//Proceedings of the 39th International Conference on Machine Learning, 2022: 9295-9309.
[125] LIN C, PENG B, LI Z Y, et al. Bit-shrinking: limiting instantaneous sharpness for improving post-training quantization[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 16196-16205.
[126] AN J L, LEI J H, SONG Y Z, et al. Tensor based multiscale low rank decomposition for hyperspectral images dimensionality reduction[J]. Remote Sensing, 2019, 11(12): 1485.
[127] MO D M, WONG W K, LAI Z H, et al. Weighted double-low-rank decomposition with application to fabric defect detection[J]. IEEE Transactions on Automation Science and Engineering, 2021, 18(3): 1170-1190.
[128] YANG H R, TANG M X, WEN W, et al. Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 2899-2908.
[129] AHMED J, GAO B, WOO W L, et al. Ensemble joint sparse low-rank matrix decomposition for thermography diagnosis system[J]. IEEE Transactions on Industrial Electronics, 2021, 68(3): 2648-2658.
[130] SHI B S, LIANG J Z, DI L, et al. Fabric defect detection via low-rank decomposition with gradient information and structured graph algorithm[J]. Information Sciences, 2021, 546: 608-626.
[131] LI H F, HE X G, YU Z T, et al. Noise-robust image fusion with low-rank sparse decomposition guided by external patch prior[J]. Information Sciences, 2020, 523: 14-37.
[132] YIN M, SUI Y, LIAO S Y, et al. Towards efficient tensor decomposition-based DNN model compression with optimization framework[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10669-10678.
[133] AHMED W, HAJIMOLAHOSEINI H, WEN A, et al. Speeding up resnet architecture with layers targeted low rank decomposition[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2309.12412.
[134] WANG D, SMITH D S, YANG X P. Dynamic MR image reconstruction based on total generalized variation and low-rank decomposition[J]. Magnetic Resonance in Medicine, 2020, 83(6): 2064-2076.
[135] LI L, LI W, DU Q, et al. Low-rank and sparse decomposition with mixture of Gaussian for hyperspectral anomaly detection[J]. IEEE Transactions on Cybernetics, 2021, 51(9): 4363-4372.
[136] XUE J Z, ZHAO Y Q, LIAO W Z, et al. Nonlocal low-rank regularized tensor decomposition for hyperspectral image denoising[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(7): 5174-5189.
[137] SWAMINATHAN S, GARG D, KANNAN R, et al. Sparse low rank factorization for deep neural network compression[J]. Neurocomputing, 2020, 398: 185-196.
[138] HAJIMOLAHOSEINI H, AHMED W, REZAGHOLIZADEH M, et al. Strategies for applying low rank decomposition to transformer-based models[C]//Proceedings of the 36th Conference on Neural Information Processing Systems, 2022.
[139] WANG Y, KANG S, DOERKSEN J D, et al. Surgical guidance via multiplexed molecular imaging of fresh tissues labeled with SERS-coded nanoparticles[J]. IEEE Journal of Selected Topics in Quantum Electronics, 2015, 22(4): 6802911.
[140] LYU K D, LI H, GONG M G, et al. Surrogate-assisted evolutionary multiobjective neural architecture search based on transfer stacking and knowledge distillation[J]. IEEE Transactions on Evolutionary Computation, 2023, 28(3): 608-622.
[141] HAMAMCI I E, ER S, SEKUBOYINA A, et al. Generate-CT: text-conditional generation of 3D chest CT volumes[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2305.16037.
[142] YIN Z, WANG J, CAO J, et al. LAMM: language-assisted multi-modal instruction-tuning dataset, framework, and benchmark[C]//Advances in Neural Information Processing Systems 36, 2024.
[143] QIN Z, LI D, SUN W, et al. Scaling transnormer to 175 billion parameters[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2307.14995.
[144] KORYAKOVSKIY I, YAKOVLEVA A, BUCHNEV V, et al. One-shot model for mixed-precision quantization[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7939-7949.
[145] DEL CORRO L, DEL GIORNO A, AGARWAL S, et al. SkipDecode: autoregressive skip decoding with batching and caching for efficient LLM inference[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2307.02628.
[146] SHAO W Q, CHEN M Z, ZHANG Z Y, et al. OmniQuant: omnidirectionally calibrated quantization for large language models[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2308.13137.
[147] ASHKBOOS S, MARKOV I, FRANTAR E, et al. QUIK: towards end-to-end 4-bit inference on generative large language models[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2310.09259.
[148] WANG H, MA S, DONG L, et al. BitNet: scaling 1-bit transformers for large language models[EB/OL]. [2023-11-06]. https://arxiv.org/abs/2310.11453.
[149] XIAO G, LIN J, SEZNEC M, et al. SmoothQuant: accurate and efficient post-training quantization for large language models[C]//Proceedings of the 40th International Conference on Machine Learning, 2023: 38087-38099.
[150] YAO Z W, AMINABADI R Y, ZHANG M J, et al. ZeroQuant: efficient and affordable post-training quantization for large-scale transformers[C]//Advances in Neural Information Processing Systems 35, 2022: 27168-27183.