计算机科学与探索

• 学术研究 •    下一篇

基于奇异值空间对抗优化的动态后门攻击

彭子铭, 丁建伟, 姚佳旺, 田华伟   

  1. 中国人民公安大学 信息网络安全学院, 北京 100038

Dynamic Backdoor Attack Based on Adversarial Optimization in Singular Value Space

PENG Ziming,  DING Janwei,  YAO Jiawang,  TIAN Huawei   

  1. School of Information and Cyber Security, People's Public Security University of China, Beijing 100038, China

摘要: 深度学习模型凭借其卓越的性能已在众多领域得到广泛应用,但研究表明其对后门攻击也具有显著的脆弱性。后门攻击可通过隐蔽的触发机制破坏模型的可靠性,当预设的触发器激活隐藏后门时,模型将执行恶意行为。当前后门攻击主要依赖于空间域或频域的扰动触发模式,且多采用样本无关的静态触发器设置,使得防御系统能够相对容易地检测并消除威胁。为了解决现有攻击隐蔽性不足和鲁棒性较弱的问题,提出一种基于奇异值空间进行阶段性对抗优化的动态后门攻击方法。首先,通过生成器生成具有样本特异性的触发器,利用奇异值分解(Singular Value Decomposition, SVD)提取干净图像和触发器的主/次特征,将触发信息嵌入干净图像次特征中,保留主特征以维持后门隐蔽性。其次,提出阶段性训练框架,第一阶段联合优化触发生成器与分类器,确保最大化后门攻击的有效性,第二阶段则用最优触发生成器继续训练后门模型。为了验证方法的隐蔽性与有效性,本文在多个经典数据集上测试了攻击方法。实验结果表明,本文方法在四个数据集上都实现了比现有攻击方法更高的攻击成功率,且在良性样本上几乎没有导致准确率下降,并绕过了四种先进的后门防御方法。同时,实验还验证了深度模型对奇异值扰动的敏感性可被恶意利用,而现有的防御机制难以识别此类攻击,为AI模型揭示了新的安全隐患。

关键词: 后门攻击, 阶段性对抗优化, 奇异值分解, 样本特异性, 模型安全

Abstract: Deep learning models have been widely applied in numerous fields due to their exceptional performance. However, research has shown that they are also highly vulnerable to backdoor attacks. Backdoor attacks can compromise the reliability of models through covert trigger mechanisms. When a preset trigger activates the hidden backdoor, the model executes malicious behavior. Current backdoor attacks primarily rely on trigger patterns based on spatial or frequency domain perturbations, often employing sample-agnostic static triggers, making it relatively easier for defense systems to detect and mitigate the threat. To address the issues of insufficient stealthiness and weak robustness in existing attacks, this paper proposes a dynamic backdoor attack method based on staged adversarial optimization in the singular value space. First, a generator is used to produce sample-specific triggers, and singular value decomposition is employed to extract the primary and secondary features of clean images and triggers. The trigger information is embedded into the secondary features of clean images while preserving the primary features to maintain backdoor stealthiness. Second, a staged training framework is introduced: the first stage jointly optimizes the trigger generator and classifier to ensure maximum backdoor attack effectiveness, while the second stage continues training the backdoor model using the optimized trigger generator. To validate the stealthiness and effectiveness of the proposed method, the attack was tested on multiple benchmark datasets. Experimental results demonstrate that the proposed method achieves higher attack success rates than existing attack methods across all four datasets, with almost no degradation in accuracy on benign samples, and successfully evades four state-of-the-art backdoor defense methods. Additionally, the experiments confirm that the sensitivity of deep models to singular value perturbations can be maliciously exploited, while existing defense mechanisms struggle to detect such attacks, revealing new security risks for AI models.

Key words: backdoor attack, stage-wise adversarial optimization, Singular Value Decomposition, sample-specific, model security