计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (11): 2721-2733.DOI: 10.3778/j.issn.1673-9418.2204107

• 人工智能·模式识别 • 上一篇    下一篇

多教师对比知识反演的无数据模型压缩方法

林振元,林绍辉,姚益武,何高奇,王长波,马利庄   

  1. 1. 华东师范大学 计算机科学与技术学院,上海 200062
    2. 北京大学 信息科学技术学院,北京 100871
  • 出版日期:2023-11-01 发布日期:2023-11-01

Multi-teacher Contrastive Knowledge Inversion for Data-Free Distillation

LIN Zhenyuan, LIN Shaohui, YAO Yiwu, HE Gaoqi, WANG Changbo, MA Lizhuang   

  1. 1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
    2. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
  • Online:2023-11-01 Published:2023-11-01

摘要: 知识蒸馏是用于压缩深度神经网络的一种有效方法,但是由于用户数据隐私保护、数据机密性或传输的限制,很多时候人们无法获取到原始数据。现有的无数据知识蒸馏方法仅使用单教师模型进行有偏特征统计,生成的数据和原始数据相比存在着多样性和泛化性差问题,从而导致压缩后模型的准确率不高。为了解决此类问题,提出了一种多教师对比知识反演的无数据模型压缩方法(MTCKI),该方法从多个可用的教师模型中提取知识并将其融合到学生模型中,以消除模型有偏统计带来的偏差,增强了合成图片的泛化性。为提升合成的图像多样性,采用对比学习的策略将当前批次生成的图像与历史的图像进行对比,迫使生成器合成与历史不相似的图片。同时,提出多教师-学生对比的策略,进一步提升学生网络的表征能力。实验表明,该方法不仅能生成视觉上令人满意的图像,而且在多个指标上优于现有的方法。生成的合成图像更接近原始数据集的分布,而且只需要一次生成的图片数据集就能泛化用于不同模型训练。

关键词: 模型压缩, 无数据, 知识蒸馏, 数据保护, 隐私保护

Abstract: Knowledge distillation is an effective method for model compression with access to training data. However, due to privacy, confidentiality, or transmission limitations, people cannot get the support of data. Existing data-free knowledge distillation methods only use biased feature statistics contained in one model and run into pro-blems with low generalizability and diversity in synthetic images and unsatisfactory student model performance. To address these problems, this paper proposes a multi-teacher contrastive knowledge inversion (MTCKI) method that extracts and fuses model-specific knowledge from the available teacher models into a student model to eliminate model bias. Further, this paper improves the diversity of synthesized images using contrastive learning, which encourages the synthetic images to be distinguishable from the previously stored images. Meanwhile, this paper proposes the strategy of contrastive loss based on multi-teacher and student to improve the feature representation ability of student network. Experiments demonstrate that MTCKI not only can generate visually satisfactory images but also outperforms existing state-of-the-art approaches. The resulting synthesized images are much closer to the distribution of the original dataset and can be generated only once to provide comprehensive guidance for various networks rather than a specific one.

Key words: model compression, data-free, knowledge distillation, data protection, privacy protection