计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (4): 1068-1082.DOI: 10.3778/j.issn.1673-9418.2305032

• 网络·安全 • 上一篇    下一篇

结合扩散模型图像编辑的图文检索后门攻击

杨舜,陆恒杨   

  1. 江南大学 人工智能与计算机学院 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
  • 出版日期:2024-04-01 发布日期:2024-04-01

Image-Text Retrieval Backdoor Attack with Diffusion-Based Image-Editing

YANG Shun, LU Hengyang   

  1. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2024-04-01 Published:2024-04-01

摘要: 深度神经网络在模型训练阶段易受到后门攻击,在训练图文检索模型时,如有攻击者恶意地将带有后门触发器的图文对注入训练数据集,训练后的模型将被嵌入后门。在模型推断阶段,输入良性样本将得到较为准确的检索结果,而输入带触发器的恶意样本会激活模型隐藏后门,将模型推断结果恶意更改为攻击者设定的结果。现有图文检索后门攻击研究都是基于在图像上直接叠加触发器的方法,存在攻击成功率不高和带毒样本图片带有明显的异常特征、视觉隐匿性低的缺点。提出了结合扩散模型的图文检索模型后门攻击方法(Diffusion-MUBA),根据样本图文对中文本关键词与感兴趣区域(ROI)的对应关系,设计触发器文本提示扩散模型,编辑样本图片中的ROI区域,生成视觉隐匿性高且图片平滑自然的带毒训练样本,并通过训练模型微调,在图文检索模型中建立错误的细粒度单词到区域对齐,把隐藏后门嵌入到检索模型中。设计了扩散模型图像编辑的攻击策略,建立了双向图文检索后门攻击模型,在图-文检索和文-图检索的后门攻击实验中均取得很好的效果,相比其他后门攻击方法提高了攻击成功率,而且避免了在带毒样本中引入特定特征的触发器图案、水印、扰动、局部扭曲形变等。在此基础上,提出了一种基于目标检测和文本匹配的后门攻击防御方法,希望对图文检索后门攻击的可行性、隐蔽性和实现的研究能够抛砖引玉,推动多模态后门攻防领域的发展。

关键词: 后门攻击, 图文检索, 扩散模型, 感兴趣区域

Abstract: Deep neural networks are susceptible to backdoor attacks during the training stage. When training an image-text retrieval model, if an attacker maliciously injects image-text pairs with a backdoor trigger into the training dataset, the backdoor will be embedded into the model. During the model inference stage, the infected model performs well on benign samples, whereas the secret trigger can activate the hidden backdoor and maliciously change the inference result to the result set by the attacker. The existing researches on backdoor attacks in image-text retrieval are based on the method of directly overlaying the trigger patterns on images, which has the disadvantages of low success rate, obvious abnormal features in poisoned image samples, and low visual concealment. This paper proposes a new backdoor attack method (Diffusion-MUBA) for image-text retrieval models based on diffusion models, designing trigger prompts for the diffusion model. Based on the correspondence between text keywords and regions of interest (ROI) in image-text pair samples, the ROI region in the image samples is edited to generate covert, smooth and natural poisoned training samples, to fine-tune through the pretrained model, establishing incorrect fine-grained word to region alignment in the image-text retrieval model, and embed hidden backdoors into the retrieval model. This paper designs the attack strategy of diffusion model image editing, proposes the backdoor attack model of bidirectional image-text retrieval, and achieves good results in the backdoor attack experiments of image-text retrieval and text-image retrieval. Compared with other backdoor attack methods, it improves the attack success rate, and avoids the impact of introducing specific characteristics of trigger patterns, watermarks, perturbations, local distortions and deformation in the poisoned samples. On this basis, this paper proposes a backdoor attack defense method based on object detection and text matching. It is hoped that the study on the feasibility, concealment, and implementation of backdoor attacks in image and text retrieval may contribute to the development of multimodal backdoor attack defense.

Key words: backdoor attack, image-text retrieval, diffusion model, region of interest