基于奇异值空间对抗优化的动态后门攻击

doi:10.3778/j.issn.1673-9418.2504095

摘要/Abstract

摘要： 深度学习模型凭借其卓越的性能已在众多领域得到广泛应用，但研究表明其对后门攻击也具有显著的脆弱性。后门攻击可通过隐蔽的触发机制破坏模型的可靠性，当预设的触发器激活隐藏后门时，模型将执行恶意行为。当前后门攻击主要依赖于空间域或频域的扰动触发模式，且多采用样本无关的静态触发器设置，使得防御系统能够相对容易地检测并消除威胁。为了解决现有攻击隐蔽性不足和鲁棒性较弱的问题，提出一种基于奇异值空间进行阶段性对抗优化的动态后门攻击方法。首先，通过生成器生成具有样本特异性的触发器，利用奇异值分解（Singular Value Decomposition, SVD）提取干净图像和触发器的主/次特征，将触发信息嵌入干净图像次特征中，保留主特征以维持后门隐蔽性。其次，提出阶段性训练框架，第一阶段联合优化触发生成器与分类器，确保最大化后门攻击的有效性，第二阶段则用最优触发生成器继续训练后门模型。为了验证方法的隐蔽性与有效性,本文在多个经典数据集上测试了攻击方法。实验结果表明，本文方法在四个数据集上都实现了比现有攻击方法更高的攻击成功率，且在良性样本上几乎没有导致准确率下降，并绕过了四种先进的后门防御方法。同时，实验还验证了深度模型对奇异值扰动的敏感性可被恶意利用，而现有的防御机制难以识别此类攻击，为AI模型揭示了新的安全隐患。

关键词: 后门攻击, 阶段性对抗优化, 奇异值分解, 样本特异性, 模型安全

Abstract: Deep learning models have been widely applied in numerous fields due to their exceptional performance. However, research has shown that they are also highly vulnerable to backdoor attacks. Backdoor attacks can compromise the reliability of models through covert trigger mechanisms. When a preset trigger activates the hidden backdoor, the model executes malicious behavior. Current backdoor attacks primarily rely on trigger patterns based on spatial or frequency domain perturbations, often employing sample-agnostic static triggers, making it relatively easier for defense systems to detect and mitigate the threat. To address the issues of insufficient stealthiness and weak robustness in existing attacks, this paper proposes a dynamic backdoor attack method based on staged adversarial optimization in the singular value space. First, a generator is used to produce sample-specific triggers, and singular value decomposition is employed to extract the primary and secondary features of clean images and triggers. The trigger information is embedded into the secondary features of clean images while preserving the primary features to maintain backdoor stealthiness. Second, a staged training framework is introduced: the first stage jointly optimizes the trigger generator and classifier to ensure maximum backdoor attack effectiveness, while the second stage continues training the backdoor model using the optimized trigger generator. To validate the stealthiness and effectiveness of the proposed method, the attack was tested on multiple benchmark datasets. Experimental results demonstrate that the proposed method achieves higher attack success rates than existing attack methods across all four datasets, with almost no degradation in accuracy on benign samples, and successfully evades four state-of-the-art backdoor defense methods. Additionally, the experiments confirm that the sensitivity of deep models to singular value perturbations can be maliciously exploited, while existing defense mechanisms struggle to detect such attacks, revealing new security risks for AI models.

Key words: backdoor attack, stage-wise adversarial optimization, Singular Value Decomposition, sample-specific, model security

彭子铭, 丁建伟, 姚佳旺, 田华伟. 基于奇异值空间对抗优化的动态后门攻击[J]. 计算机科学与探索, DOI: 10.3778/j.issn.1673-9418.2504095.

PENG Ziming, DING Janwei, YAO Jiawang, TIAN Huawei. Dynamic Backdoor Attack Based on Adversarial Optimization in Singular Value Space[J]. Journal of Frontiers of Computer Science and Technology, DOI: 10.3778/j.issn.1673-9418.2504095.

[1]	杨舜, 陆恒杨. 结合扩散模型图像编辑的图文检索后门攻击[J]. 计算机科学与探索, 2024, 18(4): 1068-1082.
[2]	钱汉伟, 孙伟松. 深度神经网络中的后门攻击与防御技术综述[J]. 计算机科学与探索, 2023, 17(5): 1038-1048.
[3]	李广丽，滑瑾，袁天，朱涛，邬任重，姬东鸿，张红斌. 基于用户偏好挖掘生成对抗网络的推荐系统[J]. 计算机科学与探索, 2020, 14(5): 803-814.
[4]	刘万军，孙思宇，曲海成. Schur分解的快速零水印算法[J]. 计算机科学与探索, 2019, 13(3): 494-504.
[5]	申婧妮，王慧琴，吴萌，杨文宗. MCA分解的唐墓室壁画修复算法[J]. 计算机科学与探索, 2017, 11(11): 1826-1836.
[6]	田尧，秦永彬，许道云，张丽. 基于双信任机制的TrustSVD算法[J]. 计算机科学与探索, 2015, 9(11): 1391-1397.

基于奇异值空间对抗优化的动态后门攻击

Dynamic Backdoor Attack Based on Adversarial Optimization in Singular Value Space

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics