深度神经网络中的后门攻击与防御技术综述

doi:10.3778/j.issn.1673-9418.2210061

摘要/Abstract

摘要： 神经网络后门攻击旨在将隐藏的后门植入到深度神经网络中，使被攻击的模型在良性测试样本上表现正常，而在带有后门触发器的有毒测试样本上表现异常，如将有毒测试样本的类别预测为攻击者的目标类。对现有攻击和防御方法进行全面的回顾，以攻击对象作为主要分类依据，将攻击方法分为数据中毒攻击、物理世界攻击、中毒模型攻击和其他攻击等类别。从攻防对抗的角度对现有后门攻击和防御的技术进行归纳总结，将防御方法分为识别有毒数据、识别中毒模型、过滤攻击数据等类别。从深度学习几何原理、可视化等角度探讨深度神经网络后门缺陷产生的原因，从软件工程、程序分析等角度探讨深度神经网络后门攻击和防御的困难以及未来发展方向。希望为研究者了解深度神经网络后门攻击与防御的研究进展提供帮助，为设计更健壮的深度神经网络提供更多启发。

关键词: 深度神经网络, 后门攻击, 后门防御, 触发器

Abstract: The neural network backdoor attack aims to implant a hidden backdoor into the deep neural network, so that the infected model behaves normally on benign test samples, but behaves abnormally on poisoned test samples with backdoor triggers. For example, all poisoned test samples will be predicted as the target label by the infected model. This paper provides a comprehensive review and the taxonomy for existing attack methods according to the attack objects, which can be categorized into four types, including data poisoning attacks, physical world attacks, model poisoning attacks, and others. This paper summarizes the existing backdoor defense technologies from the perspective of attack and defense confrontation, which include poisoned sample identifying, poisoned model identifying, poisoned test sample filtering, and others. This paper explains the principles of deep neural network backdoor defects from the perspectives of deep learning mathematical principles and visualization, and discusses the difficulties and future development directions of deep neural network backdoor attacks and countermeasures from the perspectives of software engineering and program analysis. It is hoped that this survey can help researchers understand the research progress of deep neural network backdoor attacks and countermeasures, and provide more inspiration for designing more robust deep neural networks.

Key words: deep neural network, backdoor attack, backdoor countermeasures, trigger

钱汉伟, 孙伟松. 深度神经网络中的后门攻击与防御技术综述[J]. 计算机科学与探索, 2023, 17(5): 1038-1048.

QIAN Hanwei, SUN Weisong. Survey on Backdoor Attacks and Countermeasures in Deep Neural Network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1038-1048.

参考文献

[1] GOLDBLUM M, TSIPRAS D, XIE C, et al. Dataset security for machine learning: data poisoning, backdoor attacks, and defenses[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1563-1580.
[2] KAVIANI S, SOHN I. Defense against neural trojan attacks: a survey[J]. Neurocomputing, 2021, 423: 651-667.
[3] LIU Y T, MONDAL A, CHAKRABORTY A, et al. A survey on neural trojans[C]//Proceedings of the 21st International Symposium on Quality Electronic Design, Santa Clara, Mar 25-26, 2020. Piscataway: IEEE, 2020: 33-39.
[4] LI Y, JIANG Y, LI Z, et al. Backdoor learning: a survey[J].arXiv: 2007.08745, 2020.
[5] GAO Y, DOAN B G, ZHANG Z, et al. Backdoor attacks and countermeasures on deep learning: a comprehensive review[J]. arXiv:2007.10760, 2020.
[6] DAI J, CHEN C, LI Y. A backdoor attack against LSTM-based text classification systems[J]. IEEE Access, 2019, 7: 138872-138878.
[7] CHEN X, SALEM A, BACKES M, et al. BadNL: backdoor attacks against NLP models with semantic-preserving im-provements[C]//Proceedings of the Annual Computer Secu-rity Applications Conference, Dec 6-10, 2021. New York: ACM, 2021: 554-569.
[8] SUN L. Natural backdoor attack on text data[J]. arXiv:2006. 16176, 2020.
[9] GAO Y, KIM Y, DOAN B G, et al. Design and evaluation of a multi-domain trojan detection method on deep neural net-works[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 19(4): 2349-2364.
[10] KONG Y, ZHANG J. Adversarial audio: a new information hiding method and backdoor for DNN-based speech recog-nition models[J]. arXiv:1904.03829, 2019.
[11] ZHANG X, ZHANG Z, JI S, et al. Trojaning language models for fun and profit[C]//Proceedings of the 2021 IEEE Euro-pean Symposium on Security and Privacy, Vienna, Sep 6-10, 2021. Piscataway: IEEE, 2021: 179-197.
[12] MA Y, JUN K S, LI L, et al. Data poisoning attacks in con-textual bandits[C]//LNCS 11199: Proceedings of the 9th International Conference on Decision and Game Theory for Security, Seattle, Oct 29-31, 2018. Cham: Springer, 2018: 186-204.
[13] SHEN J, XIA M. AI data poisoning attack: manipulating game AI of Go[J]. arXiv:2007.11820, 2020.
[14] ZHANG Z, JIA J, WANG B, et al. Backdoor attacks to graph neural networks[C]//Proceedings of the 26th ACM Symposium on Access Control Models and Technologies, Spain, Jun 16-18, 2021. New York: ACM, 2021: 15-26.
[15] LI S, XUE M, ZHAO B Z, et al. Invisible backdoor attacks on deep neural networks via steganography and regulariza-tion[J]. IEEE Transactions on Dependable and Secure Com-puting, 2020, 18(5): 2088-2105.
[16] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[17] PAPERNOT N, MCDANIEL P, JHA S, et al. The limitations of deep learning in adversarial settings[C]//Proceedings of the 2016 IEEE European Symposium on Security and Privacy, Saarbrücken, Mar 21-24, 2016. Piscataway: IEEE, 2016: 372-387.
[18] VELDANDA A K, LIU K, TAN B, et al. NNoculation: broad spectrum and targeted treatment of backdoored DNNs[J]. arXiv:2002.08313, 2020.
[19] MA Y, TSAO D, SHUM H Y. On the principles of parsimony and self-consistency for the emergence of intelligence[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(9): 1298-1323.
[20] SIMONYAN K, VEDALDI A, ZISSERMAN A. Deep inside convolutional networks: visualising image classification models and saliency maps[J]. arXiv.1312.6034, 2013.
[21] Olah??C. Neural networks, manifolds, and topology[EB/OL]. [2022-09-20]. http://colah.github.io/posts/2014-03-NN-Ma-nifolds-Topology/.
[22] ILYAS A, SANTURKAR S, TSIPRAS D, et al. Adversarial examples are not bugs, they are features[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, Dec 8-14, 2019: 125-136.
[23] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255.
[24] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2020, Dec 6-12, 2020. Red Hook: Curran Associates, 2020: 159.
[25] GU T, LIU K, DOLAN-GAVITT B, et al. BadNets: evalua-ting backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244.
[26] CHEN X, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[J]. arXiv:1712.05526, 2017.
[27] TURNER A, TSIPRAS D, MADRY A. Label-consistent backdoor attacks[J]. arXiv:1912.02771, 2019.
[28] SHAFAHI A, HUANG W R, NAJIBI M, et al. Poison frogs! targeted clean-label poisoning attacks on neural networks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montréal, Dec 3-8, 2018: 6106-6116.
[29] ZHU C, HUANG W R, LI H, et al. Transferable clean-label poisoning attacks on deep neural nets[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 7614-7623.
[30] HUANG W R, GEIPING J, FOWL L, et al. MetaPoison: practical general-purpose clean-label data poisoning[C]//Proceedings of the Annual Conference on Neural Infor-mation Processing Systems 2020, Dec 6-12, 2020: 12080-12091.
[31] LIU Y, MA X, BAILEY J, et al. Reflection backdoor: a natural backdoor attack on deep neural networks[C]//LNCS 12355: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Sprin-ger, 2020: 182-199.
[32] LIN J, XU L, LIU Y, et al. Composite backdoor attack for deep neural network by mixing existing benign features[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Nov 9-13, 2020. New York: ACM, 2020: 113-131.
[33] XIAO Q, CHEN Y, SHEN C, et al. Seeing is not believing: camouflage attacks on image scaling algorithms[C]//Procee-dings of the 28th USENIX Security Symposium, Santa Clara, Aug 14, 2019. Berkeley: USENIX Association, 2019: 443-460.
[34] SHARIF M, BHAGAVATULA S, BAUER L, et al. Accesso-rize to a crime: real and stealthy attacks on state-of-the-art face recognition[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Oct 24-28, 2016. New York: ACM, 2016: 1528-1540.
[35] EYKHOLT K, EVTIMOV I, FERNANDES E, et al. Robust physical-world attacks on deep learning visual classification[C]//Proceedings of the 2018 IEEE Conference on Com-puter Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1625-1634.
[36] LI Y, ZHAI T, JIANG Y, et al. Backdoor attack in the physical world[J]. arXiv:2104.02361, 2021.
[37] SALEM A, WEN R, BACKES M, et al. Dynamic backdoor attacks against machine learning models[C]//Proceedings of the 7th IEEE European Symposium on Security and Pri-vacy, Genoa, Jun 6-10, 2022. Piscataway: IEEE, 2022: 703-718.
[38] DUMFORD J, SCHEIRER W. Backdooring convolutional neural networks via targeted weight perturbations[C]//Pro-ceedings of the 2020 IEEE International Joint Conference on Biometrics, Houston, Sep 28-Oct 1, 2020. Piscataway: IEEE, 2020: 1-9.
[39] RAKIN A S, HE Z, FAN D. TBT: targeted neural network attack with bit trojan[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 13198-13207.
[40] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[41] CHEN H, FU C, ZHAO J, et al. ProFlip: targeted trojan attack with progressive bit flips[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vi-sion, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 7698-7707.
[42] TANG R, DU M, LIU N, et al. An embarrassingly simple approach for trojan attack in deep neural networks[C]//Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Aug 23-27, 2020. New York: ACM, 2020: 218-228.
[43] LI Y, HUA J, WANG H, et al. DeepPayload: black-box back-door attack on deep learning models through neural pay-load injection[C]//Proceedings of the 43rd IEEE/ACM In-ternational Conference on Software Engineering, Madrid, May 22-30, 2021. Piscataway: IEEE, 2021: 263-274.
[44] LIU Y, MA S, AAFER Y, et al. Trojaning attack on neural networks[C]//Proceedings of the 25th Annual Network and Distributed System Security Symposium, San Diego, Feb 18-21, 2018: 1-15.
[45] LIU Y, CHEN X, LIU C, et al. Delving into transferable adversarial examples and black-box attacks[J]. arXiv:1611.02770, 2016.
[46] QUIRING E, RIECK K. Backdooring and poisoning neural networks with image-scaling a ttacks[C]//Proceedings of the 2020 IEEE Security and Privacy Workshops, San Franci-sco, May 21, 2020. Piscataway: IEEE, 2020: 41-47.
[47] CHEN K, MENG Y, SUN X, et al. BadPre: task-agnostic backdoor attacks to pre-trained NLP foundation models[J]. arXiv:2110.02467, 2021.
[48] WANG S, NEPAL S, RUDOLPH C, et al. Backdoor attacks against transfer learning with pre-trained deep learning mo-dels[J]. IEEE Transactions on Services Computing, 2022, 15(3): 1526-1539.
[49] GUO S, XIE C, LI J, et al. Threats to pre-trained language models: survey and taxonomy[J]. arXiv:2202.06862, 2022.
[50] GONG X, CHEN Y, WANG Q, et al. Backdoor attacks and defenses in federated learning: state-of-the-art, taxonomy, and future directions[J]. IEEE Wireless Communications, 2022.
[51] BAGDASARYAN E, VEIT A, HUA Y, et al. How to back-door federated learning[C]//Proceedings of the 23rd Interna-tional Conference on Artificial Intelligence and Statistics, Palermo, Aug 26-28, 2020: 2938-2948.
[52] FANG M, CAO X, JIA J, et al. Local model poisoning attacks to Byzantine-robust federated learning[C]//Procee-dings of the 29th USENIX Security Symposium, Aug 12-14, 2020. Berkeley: USENIX Association, 2020: 1605-1622.
[53] TRAN B, LI J, MADRY A. Spectral signatures in backdoor attacks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montréal, Dec 3-8, 2018: 8011-8021.
[54] CHAN A, ONG Y S. Poison as a cure: detecting & neutra-lizing variable-sized backdoor attacks in deep neural net-works[J]. arXiv:1911.08040, 2019.
[55] CHEN B, CARVALHO W, BARACALDO N, et al. Detec-ting backdoor attacks on deep neural networks by activa-tion clustering[J]. arXiv:1811.03728, 2018.
[56] PERI N, GUPTA N, HUANG W R, et al. Deep k-NN defense against clean-label data poisoning attacks[C]//LNCS 12535: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 55-70.
[57] LIU Y, XIE Y, SRIVASTAVA A. Neural trojans[C]//Procee-dings of the 2017 IEEE International Conference on Com-puter Design, Boston, Nov 5-8, 2017. Washington: IEEE Computer Society, 2017: 45-48.
[58] DOAN B G, ABBASNEJAD E, RANASINGHE D C. Feb-ruus: input purification defense against trojan attacks on deep neural network systems[C]//Proceedings of the Annual Computer Security Applications Conference, Austin, Dec 7-11, 2020. New York: ACM, 2020: 897-912.
[59] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE Inter-national Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626.
[60] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[61] VILLARREAL-VASQUEZ M, BHARGAVA B. ConFoc: content-focus protection against trojan attacks on neural networks[J]. arXiv:2007.00711, 2020.
[62] LI Y, ZHAI T, JIANG Y, et al. Backdoor attack in the phy-sical world[J]. arXiv:2104.02361, 2021.
[63] QIU H, ZENG Y, GUO S, et al. DeepSweep: an evaluation framework for mitigating DNN backdoor attacks using data augmentation[C]//Proceedings of the 2021 ACM Asia Con-ference on Computer and Communications Security, Hong Kong, China, Jun 7-11, 2021. New York: ACM, 2021: 363-377.
[64] LIU K, DOLAN-GAVITT B, GARG S. Fine-pruning: de-fending against backdooring attacks on deep neural net-works[C]//LNCS 11050: Proceedings of the 21st Interna-tional Symposium Research in Attacks, Intrusions, and De-fenses, Heraklion, Sep 10-12, 2018. Cham: Springer, 2018: 273-294.
[65] WANG B, YAO Y, SHAN S, et al. Neural cleanse: identif-ying and mitigating backdoor attacks in neural networks[C]//Proceedings of the 2019 IEEE Symposium on Security and Privacy, San Francisco, May 19-23, 2019. Piscataway: IEEE, 2019: 707-723.
[66] LIU Y, LEE W C, TAO G, et al. ABS: scanning neural net-works for back-doors by artificial brain stimulation[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, Nov 11-15, 2019. New York: ACM, 2019: 1265-1282.
[67] XIE C, WANG J, ZHANG Z, et al. Mitigating adversarial effects through randomization[J]. arXiv:1711.01991, 2017.
[68] 中国电动汽车百人会, 腾讯自动驾驶, 中汽中心. 中国自动驾驶仿真蓝皮书[EB/OL]. [2022-09-20]. https://case.valuepr. net/file/1012_blue_paper.pdf.
China EV100, Tencent Autonomous Driving, CATARC. China autonomous driving simulation blue paper[EB/OL]. [2022-09-20]. https://case.valuepr.net/file/1012_blue_ paper.pdf.
[69] TRAMER F, CARLINI N, BRENDEL W, et al. On adaptive attacks to adversarial example defenses[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2020, Dec 6-12, 2020: 1633-1645.

编辑推荐 0

Metrics

阅读次数

全文

1211

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	164	0	1047

来源	本网站	其他网站

次数	899	312
比例	74%	26%

摘要

1240

最新录用	在线预览	正式出版

166	0	1074

来源	本网站	其他网站

次数	1238	2
比例	100%	0%