计算机科学与探索

• 学术研究 •    下一篇

结合图像显著性区域的局部动态干净标签后门攻击

洪维, 耿沛霖, 王弘宇, 张雪芹, 顾春华   

  1. 1. 华东理工大学 信息科学与工程学院, 上海  200000
    2. 上海市计算机软件评测重点实验室, 上海  200000

Local Dynamic Clean-label Backdoor Attack with Image Salience Regions

HONG Wei, GENG Peilin, WANG Hongyu, ZHANG Xueqin, GU Chunhua   

  1. 1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200000, China
    2. Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai 200000, China

摘要: 随着深度学习技术的广泛应用,针对深度学习模型的后门攻击也越来越多。研究后门攻击对揭示人工智能领域存在的安全隐患具有重要意义。为改进现有干净标签后门攻击方法在实际场景下可行性较低、隐蔽性不够高、攻击效果不佳等问题,提出了一种结合图像显著性区域的局部动态干净标签后门攻击方法。在仅掌握少量目标类数据的前提下,该方法首先引入代理模型训练方法,并通过隐式语义数据增广(Implicit Semantic Data Augmentation, ISDA)在训练阶段增加样本多样性。随后,利用小批量随机梯度下降(mini-batch stochastic gradient descent, MBSGD)优化算法生成与目标类相匹配的扰动,并设计特征分离正则化(feature disentanglement regularization, FDR)方法,扩大中毒图像特征与干净图像特征的差异,从而提高攻击的有效性。为了增强攻击的隐蔽性和鲁棒性,采用Grad-CAM(gradient-weighted class activation mapping)算法提取输入图像的显著性区域,将扰动限制在这些关键像素上,使生成的中毒样本触发器具有局部动态性。实验表明,所提方法在不超过0.05%的低中毒率下,攻击性能仍能超过目前一些先进的干净标签攻击方法,对现有防御模型仍然具备威胁性。

关键词: 深度学习, 后门攻击, 干净标签攻击, 显著性区域, 特征分离

Abstract: With the widespread application of deep learning technology, there are more and more backdoor attacks on deep learning models. The research on backdoor attacks is of great significance in revealing security risks in the field of artificial intelligence. To address the limitations of existing clean-label backdoor attacks, such as low feasibility in practical scenarios, insufficient stealthiness, and poor attack effectiveness, a method for local dynamic clean-label backdoor attacks with image salience regions is proposed. Under the premise of having access to only a small amount of target class data, the proposed method first introduces a surrogate model training approach and employs Implicit Semantic Data Augmentation (ISDA) to enhance sample diversity during the training phase. Subsequently, the perturbation that matches the target class is generated using the Mini-batch Stochastic Gradient Descent (MBSGD) optimization algorithm. To further enhance the attack's effectiveness, a Feature Disentanglement Regularization (FDR) method is designed to amplify the differences between the features of poisoned images and clean images. To improve the stealthiness and robustness of the attack, the Grad-CAM (Grad-CAM) algorithm is utilized to extract the saliency regions of the input images, restricting perturbations to these key pixels and ensuring the trigger exhibits localized dynamic behavior. Experiments have shown that even with a poison rate below 0.05%, the method still outperforms some advanced clean label attacks,and poses a threat to existing defense methods.

Key words: deep learning, backdoor attack, clean-label attack, saliency regions, feature disentanglement