Image-Text Retrieval Backdoor Attack with Diffusion-Based Image-Editing

doi:10.3778/j.issn.1673-9418.2305032

Abstract

Abstract: Deep neural networks are susceptible to backdoor attacks during the training stage. When training an image-text retrieval model, if an attacker maliciously injects image-text pairs with a backdoor trigger into the training dataset, the backdoor will be embedded into the model. During the model inference stage, the infected model performs well on benign samples, whereas the secret trigger can activate the hidden backdoor and maliciously change the inference result to the result set by the attacker. The existing researches on backdoor attacks in image-text retrieval are based on the method of directly overlaying the trigger patterns on images, which has the disadvantages of low success rate, obvious abnormal features in poisoned image samples, and low visual concealment. This paper proposes a new backdoor attack method (Diffusion-MUBA) for image-text retrieval models based on diffusion models, designing trigger prompts for the diffusion model. Based on the correspondence between text keywords and regions of interest (ROI) in image-text pair samples, the ROI region in the image samples is edited to generate covert, smooth and natural poisoned training samples, to fine-tune through the pretrained model, establishing incorrect fine-grained word to region alignment in the image-text retrieval model, and embed hidden backdoors into the retrieval model. This paper designs the attack strategy of diffusion model image editing, proposes the backdoor attack model of bidirectional image-text retrieval, and achieves good results in the backdoor attack experiments of image-text retrieval and text-image retrieval. Compared with other backdoor attack methods, it improves the attack success rate, and avoids the impact of introducing specific characteristics of trigger patterns, watermarks, perturbations, local distortions and deformation in the poisoned samples. On this basis, this paper proposes a backdoor attack defense method based on object detection and text matching. It is hoped that the study on the feasibility, concealment, and implementation of backdoor attacks in image and text retrieval may contribute to the development of multimodal backdoor attack defense.

Key words: backdoor attack, image-text retrieval, diffusion model, region of interest

摘要： 深度神经网络在模型训练阶段易受到后门攻击，在训练图文检索模型时，如有攻击者恶意地将带有后门触发器的图文对注入训练数据集，训练后的模型将被嵌入后门。在模型推断阶段，输入良性样本将得到较为准确的检索结果，而输入带触发器的恶意样本会激活模型隐藏后门，将模型推断结果恶意更改为攻击者设定的结果。现有图文检索后门攻击研究都是基于在图像上直接叠加触发器的方法，存在攻击成功率不高和带毒样本图片带有明显的异常特征、视觉隐匿性低的缺点。提出了结合扩散模型的图文检索模型后门攻击方法（Diffusion-MUBA），根据样本图文对中文本关键词与感兴趣区域（ROI）的对应关系，设计触发器文本提示扩散模型，编辑样本图片中的ROI区域，生成视觉隐匿性高且图片平滑自然的带毒训练样本，并通过训练模型微调，在图文检索模型中建立错误的细粒度单词到区域对齐，把隐藏后门嵌入到检索模型中。设计了扩散模型图像编辑的攻击策略，建立了双向图文检索后门攻击模型，在图-文检索和文-图检索的后门攻击实验中均取得很好的效果，相比其他后门攻击方法提高了攻击成功率，而且避免了在带毒样本中引入特定特征的触发器图案、水印、扰动、局部扭曲形变等。在此基础上，提出了一种基于目标检测和文本匹配的后门攻击防御方法，希望对图文检索后门攻击的可行性、隐蔽性和实现的研究能够抛砖引玉，推动多模态后门攻防领域的发展。

关键词: 后门攻击, 图文检索, 扩散模型, 感兴趣区域

YANG Shun, LU Hengyang. Image-Text Retrieval Backdoor Attack with Diffusion-Based Image-Editing[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 1068-1082.

杨舜, 陆恒杨. 结合扩散模型图像编辑的图文检索后门攻击[J]. 计算机科学与探索, 2024, 18(4): 1068-1082.

References

[1] 刘颖, 郭莹莹, 房杰, 等. 深度学习跨模态图文检索研究综述[J]. 计算机科学与探索, 2022, 16(3): 489-511.
LIU Y, GUO Y Y, FANG J, et al. Survey of research on deep learning image-text cross-modal retrieval[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 489-511.
[2] LU J, BATRA D, PARIKH D, et al. VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 13-23.
[3] TAN H, BANSAL M. LXMERT: learning cross-modality encoder representations from transformers[J]. arXiv:1908. 07490, 2019.
[4] LI L H, YATSKAR M, YIN D, et al. VisualBERT: a simple and performant baseline for vision and language[J]. arXiv:1908.03557, 2019.
[5] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[6] DUMFORD J, SCHEIRER W. Backdooring convolutional neural networks via targeted weight perturbations[C]//Proceedings of the 2020 IEEE International Joint Conference on Biometrics. Piscataway: IEEE, 2020: 1-9.
[7] COSTALES R, MAO C, NORWITZ R, et al. Live trojan attacks on deep neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 796-797.
[8] 钱汉伟, 孙伟松. 深度神经网络中的后门攻击与防御技术综述[J]. 计算机科学与探索, 2023, 17(5): 1038-1048.
QIAN H W, SUN W S. Survey on backdoor attacks and countermeasures in deep neural network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1038-1048.
[9] CARLINI N, TERZIS A. Poisoning and backdooring contrastive learning[C]//Proceedings of the 10th International Conference on Learning Representations, Apr 25-29, 2022.
[10] WALMER M, SIKKA K, SUR I, et al. Dual-key multimodal backdoors for visual question answering[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 15375-15385.
[11] YAN F, MIKOLAJCZYK K. Deep correlation for matching images and text[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3441-3450.
[12] FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 7-16.
[13] CASTREJON L, AYTAR Y, VONDRICK C, et al. Learning aligned cross-modal representations from weakly aligned data[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2940-2949.
[14] GU W, GU X, GU J, et al. Adversary guided asymmetric hashing for cross-modal retrieval[C]//Proceedings of the 2019 International Conference on Multimedia Retrieval. New York: ACM, 2019: 159-167.
[15] YANG D, WU D, ZHANG W, et al. Deep semantic-alignment hashing for unsupervised cross-modal retrieval[C]//Proceedings of the 2020 International Conference on Multimedia Retrieval. New York: ACM, 2020: 44-52.
[16] YAO H L, ZHAN Y W, CHEN Z D, et al. Teach: attention-aware deep cross-modal hashing[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval. New York: ACM, 2021: 376-384.
[17] KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 3128-3137.
[18] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Image-Net classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[19] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 1-9.
[20] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 770-778.
[21] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[22] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, Jul 18-24, 2021: 8748-8763.
[23] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//Proceedings of the 9th International Conference on Learning Representations, Austria, May 3-7, 2021: 1-6.
[24] LI J, LI D, XIONG C, et al. Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation[C]//Proceedings of the 2022 International Conference on Machine Learning, Baltimore, Jul 17-23, 2022: 12888-12900.
[25] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 6840-6851.
[26] SONG J, MENG C, ERMON S. Denoising diffusion implicit models[J]. arXiv:2010.02502, 2020.
[27] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10684-10695.
[28] GU T, LIU K, DOLAN-GAVITT B, et al. BadNets: evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244.
[29] LI S, XUE M, ZHAO B Z H, et al. Invisible backdoor attacks on deep neural networks via steganography and regularization[J]. IEEE Transactions on Dependable and Secure Computing, 2020, 18(5): 2088-2105.
[30] ZHONG H, LIAO C, SQUICCIARINI A C, et al. Backdoor embedding in convolutional neural network models via invisible perturbation[C]//Proceedings of the 10th ACM Conference on Data and Application Security and Privacy. New York: ACM, 2020: 97-108.
[31] TURNER A, TSIPRAS D, MADRY A. Label-consistent backdoor attacks[J]. arXiv:1912.02771, 2019.
[32] LIU Y, MA X, BAILEY J, et al. Reflection backdoor: a natural backdoor attack on deep neural networks[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 34th: 182-199.
[33] SAHA A, SUBRAMANYA A, PIRSIAVASH H. Hidden trigger backdoor attacks[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2020: 11957-11965.
[34] ZHAO S, MA X, ZHENG X, et al. Clean-label backdoor attacks on video recognition models[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 14443-14452.
[35] WANG S, NEPAL S, RUDOLPH C, et al. Backdoor attacks against transfer learning with pre-trained deep learning models[J]. IEEE Transactions on Services Computing, 2020, 15(3): 1526-1539.
[36] RAKIN A S, HE Z, FAN D. TBT: targeted neural network attack with bit trojan[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13198-13207.
[37] JIA J, LIU Y, GONG N Z. BadEncoder: backdoor attacks to pre-trained encoders in self-supervised learning[C]//Proceedings of the 2022 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2022: 2043-2059.
[38] KWON H, KIM Y. BlindNet backdoor: attack on deep neural network using blind watermark[J]. Multimedia Tools and Applications, 2022, 81(5): 6217-6234.
[39] YANG Z, IYER N, REIMANN J, et al. Design of intentional backdoors in sequential models[J]. arXiv:1902.09972, 2019.
[40] DAI J, CHEN C, LI Y. A backdoor attack against LSTM-based text classification systems[J]. IEEE Access, 2019, 7: 138872-138878.
[41] CHEN X, SALEM A, CHEN D, et al. BadNL: backdoor attacks against NLP models with semantic-preserving improvements[C]//Proceedings of the Annual Computer Security Applications Conference. New York: ACM, 2021: 554-569.
[42] KURITA K, MICHEL P, NEUBIG G. Weight poisoning attacks on pre-trained models[J]. arXiv:2004.06660, 2020.
[43] SUN L. Natural backdoor attack on text data[J]. arXiv:2006.16176, 2020.
[44] LI L, SONG D, LI X, et al. Backdoor attacks on pre-trained models by layerwise weight poisoning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 3023-3032.
[45] CHEN X, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[J]. arXiv:1712.05526, 2017.
[46] PLUMMER B A, WANG L, CERVANTES C M, et al. Flickr30k entities: collecting region-to-phrase correspondences for richer image-to-sentence models[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington: IEEE Computer Society, 2015: 2641-2649.
[47] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 740-755.
[48] CAO J, QIAN S, ZHANG H, et al. Global relation-aware attention network for image-text retrieval[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval.New York: ACM, 2021: 19-28.
[49] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 779-788.