计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (5): 1038-1048.DOI: 10.3778/j.issn.1673-9418.2210061
钱汉伟,孙伟松
出版日期:
2023-05-01
发布日期:
2023-05-01
QIAN Hanwei, SUN Weisong
Online:
2023-05-01
Published:
2023-05-01
摘要: 神经网络后门攻击旨在将隐藏的后门植入到深度神经网络中,使被攻击的模型在良性测试样本上表现正常,而在带有后门触发器的有毒测试样本上表现异常,如将有毒测试样本的类别预测为攻击者的目标类。对现有攻击和防御方法进行全面的回顾,以攻击对象作为主要分类依据,将攻击方法分为数据中毒攻击、物理世界攻击、中毒模型攻击和其他攻击等类别。从攻防对抗的角度对现有后门攻击和防御的技术进行归纳总结,将防御方法分为识别有毒数据、识别中毒模型、过滤攻击数据等类别。从深度学习几何原理、可视化等角度探讨深度神经网络后门缺陷产生的原因,从软件工程、程序分析等角度探讨深度神经网络后门攻击和防御的困难以及未来发展方向。希望为研究者了解深度神经网络后门攻击与防御的研究进展提供帮助,为设计更健壮的深度神经网络提供更多启发。
钱汉伟, 孙伟松. 深度神经网络中的后门攻击与防御技术综述[J]. 计算机科学与探索, 2023, 17(5): 1038-1048.
QIAN Hanwei, SUN Weisong. Survey on Backdoor Attacks and Countermeasures in Deep Neural Network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1038-1048.
[1] GOLDBLUM M, TSIPRAS D, XIE C, et al. Dataset security for machine learning: data poisoning, backdoor attacks, and defenses[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1563-1580. [2] KAVIANI S, SOHN I. Defense against neural trojan attacks: a survey[J]. Neurocomputing, 2021, 423: 651-667. [3] LIU Y T, MONDAL A, CHAKRABORTY A, et al. A survey on neural trojans[C]//Proceedings of the 21st International Symposium on Quality Electronic Design, Santa Clara, Mar 25-26, 2020. Piscataway: IEEE, 2020: 33-39. [4] LI Y, JIANG Y, LI Z, et al. Backdoor learning: a survey[J].arXiv: 2007.08745, 2020. [5] GAO Y, DOAN B G, ZHANG Z, et al. Backdoor attacks and countermeasures on deep learning: a comprehensive review[J]. arXiv:2007.10760, 2020. [6] DAI J, CHEN C, LI Y. A backdoor attack against LSTM-based text classification systems[J]. IEEE Access, 2019, 7: 138872-138878. [7] CHEN X, SALEM A, BACKES M, et al. BadNL: backdoor attacks against NLP models with semantic-preserving im-provements[C]//Proceedings of the Annual Computer Secu-rity Applications Conference, Dec 6-10, 2021. New York: ACM, 2021: 554-569. [8] SUN L. Natural backdoor attack on text data[J]. arXiv:2006. 16176, 2020. [9] GAO Y, KIM Y, DOAN B G, et al. Design and evaluation of a multi-domain trojan detection method on deep neural net-works[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 19(4): 2349-2364. [10] KONG Y, ZHANG J. Adversarial audio: a new information hiding method and backdoor for DNN-based speech recog-nition models[J]. arXiv:1904.03829, 2019. [11] ZHANG X, ZHANG Z, JI S, et al. Trojaning language models for fun and profit[C]//Proceedings of the 2021 IEEE Euro-pean Symposium on Security and Privacy, Vienna, Sep 6-10, 2021. Piscataway: IEEE, 2021: 179-197. [12] MA Y, JUN K S, LI L, et al. Data poisoning attacks in con-textual bandits[C]//LNCS 11199: Proceedings of the 9th International Conference on Decision and Game Theory for Security, Seattle, Oct 29-31, 2018. Cham: Springer, 2018: 186-204. [13] SHEN J, XIA M. AI data poisoning attack: manipulating game AI of Go[J]. arXiv:2007.11820, 2020. [14] ZHANG Z, JIA J, WANG B, et al. Backdoor attacks to graph neural networks[C]//Proceedings of the 26th ACM Symposium on Access Control Models and Technologies, Spain, Jun 16-18, 2021. New York: ACM, 2021: 15-26. [15] LI S, XUE M, ZHAO B Z, et al. Invisible backdoor attacks on deep neural networks via steganography and regulariza-tion[J]. IEEE Transactions on Dependable and Secure Com-puting, 2020, 18(5): 2088-2105. [16] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [17] PAPERNOT N, MCDANIEL P, JHA S, et al. The limitations of deep learning in adversarial settings[C]//Proceedings of the 2016 IEEE European Symposium on Security and Privacy, Saarbrücken, Mar 21-24, 2016. Piscataway: IEEE, 2016: 372-387. [18] VELDANDA A K, LIU K, TAN B, et al. NNoculation: broad spectrum and targeted treatment of backdoored DNNs[J]. arXiv:2002.08313, 2020. [19] MA Y, TSAO D, SHUM H Y. On the principles of parsimony and self-consistency for the emergence of intelligence[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(9): 1298-1323. [20] SIMONYAN K, VEDALDI A, ZISSERMAN A. Deep inside convolutional networks: visualising image classification models and saliency maps[J]. arXiv.1312.6034, 2013. [21] Olah??C. Neural networks, manifolds, and topology[EB/OL]. [2022-09-20]. http://colah.github.io/posts/2014-03-NN-Ma-nifolds-Topology/. [22] ILYAS A, SANTURKAR S, TSIPRAS D, et al. Adversarial examples are not bugs, they are features[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, Dec 8-14, 2019: 125-136. [23] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. Washington: IEEE Computer Society, 2009: 248-255. [24] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2020, Dec 6-12, 2020. Red Hook: Curran Associates, 2020: 159. [25] GU T, LIU K, DOLAN-GAVITT B, et al. BadNets: evalua-ting backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244. [26] CHEN X, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[J]. arXiv:1712.05526, 2017. [27] TURNER A, TSIPRAS D, MADRY A. Label-consistent backdoor attacks[J]. arXiv:1912.02771, 2019. [28] SHAFAHI A, HUANG W R, NAJIBI M, et al. Poison frogs! targeted clean-label poisoning attacks on neural networks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montréal, Dec 3-8, 2018: 6106-6116. [29] ZHU C, HUANG W R, LI H, et al. Transferable clean-label poisoning attacks on deep neural nets[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 7614-7623. [30] HUANG W R, GEIPING J, FOWL L, et al. MetaPoison: practical general-purpose clean-label data poisoning[C]//Proceedings of the Annual Conference on Neural Infor-mation Processing Systems 2020, Dec 6-12, 2020: 12080-12091. [31] LIU Y, MA X, BAILEY J, et al. Reflection backdoor: a natural backdoor attack on deep neural networks[C]//LNCS 12355: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Sprin-ger, 2020: 182-199. [32] LIN J, XU L, LIU Y, et al. Composite backdoor attack for deep neural network by mixing existing benign features[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Nov 9-13, 2020. New York: ACM, 2020: 113-131. [33] XIAO Q, CHEN Y, SHEN C, et al. Seeing is not believing: camouflage attacks on image scaling algorithms[C]//Procee-dings of the 28th USENIX Security Symposium, Santa Clara, Aug 14, 2019. Berkeley: USENIX Association, 2019: 443-460. [34] SHARIF M, BHAGAVATULA S, BAUER L, et al. Accesso-rize to a crime: real and stealthy attacks on state-of-the-art face recognition[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Oct 24-28, 2016. New York: ACM, 2016: 1528-1540. [35] EYKHOLT K, EVTIMOV I, FERNANDES E, et al. Robust physical-world attacks on deep learning visual classification[C]//Proceedings of the 2018 IEEE Conference on Com-puter Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. Washington: IEEE Computer Society, 2018: 1625-1634. [36] LI Y, ZHAI T, JIANG Y, et al. Backdoor attack in the physical world[J]. arXiv:2104.02361, 2021. [37] SALEM A, WEN R, BACKES M, et al. Dynamic backdoor attacks against machine learning models[C]//Proceedings of the 7th IEEE European Symposium on Security and Pri-vacy, Genoa, Jun 6-10, 2022. Piscataway: IEEE, 2022: 703-718. [38] DUMFORD J, SCHEIRER W. Backdooring convolutional neural networks via targeted weight perturbations[C]//Pro-ceedings of the 2020 IEEE International Joint Conference on Biometrics, Houston, Sep 28-Oct 1, 2020. Piscataway: IEEE, 2020: 1-9. [39] RAKIN A S, HE Z, FAN D. TBT: targeted neural network attack with bit trojan[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 13198-13207. [40] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Con-ference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778. [41] CHEN H, FU C, ZHAO J, et al. ProFlip: targeted trojan attack with progressive bit flips[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vi-sion, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 7698-7707. [42] TANG R, DU M, LIU N, et al. An embarrassingly simple approach for trojan attack in deep neural networks[C]//Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Aug 23-27, 2020. New York: ACM, 2020: 218-228. [43] LI Y, HUA J, WANG H, et al. DeepPayload: black-box back-door attack on deep learning models through neural pay-load injection[C]//Proceedings of the 43rd IEEE/ACM In-ternational Conference on Software Engineering, Madrid, May 22-30, 2021. Piscataway: IEEE, 2021: 263-274. [44] LIU Y, MA S, AAFER Y, et al. Trojaning attack on neural networks[C]//Proceedings of the 25th Annual Network and Distributed System Security Symposium, San Diego, Feb 18-21, 2018: 1-15. [45] LIU Y, CHEN X, LIU C, et al. Delving into transferable adversarial examples and black-box attacks[J]. arXiv:1611.02770, 2016. [46] QUIRING E, RIECK K. Backdooring and poisoning neural networks with image-scaling a ttacks[C]//Proceedings of the 2020 IEEE Security and Privacy Workshops, San Franci-sco, May 21, 2020. Piscataway: IEEE, 2020: 41-47. [47] CHEN K, MENG Y, SUN X, et al. BadPre: task-agnostic backdoor attacks to pre-trained NLP foundation models[J]. arXiv:2110.02467, 2021. [48] WANG S, NEPAL S, RUDOLPH C, et al. Backdoor attacks against transfer learning with pre-trained deep learning mo-dels[J]. IEEE Transactions on Services Computing, 2022, 15(3): 1526-1539. [49] GUO S, XIE C, LI J, et al. Threats to pre-trained language models: survey and taxonomy[J]. arXiv:2202.06862, 2022. [50] GONG X, CHEN Y, WANG Q, et al. Backdoor attacks and defenses in federated learning: state-of-the-art, taxonomy, and future directions[J]. IEEE Wireless Communications, 2022. [51] BAGDASARYAN E, VEIT A, HUA Y, et al. How to back-door federated learning[C]//Proceedings of the 23rd Interna-tional Conference on Artificial Intelligence and Statistics, Palermo, Aug 26-28, 2020: 2938-2948. [52] FANG M, CAO X, JIA J, et al. Local model poisoning attacks to Byzantine-robust federated learning[C]//Procee-dings of the 29th USENIX Security Symposium, Aug 12-14, 2020. Berkeley: USENIX Association, 2020: 1605-1622. [53] TRAN B, LI J, MADRY A. Spectral signatures in backdoor attacks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2018, Montréal, Dec 3-8, 2018: 8011-8021. [54] CHAN A, ONG Y S. Poison as a cure: detecting & neutra-lizing variable-sized backdoor attacks in deep neural net-works[J]. arXiv:1911.08040, 2019. [55] CHEN B, CARVALHO W, BARACALDO N, et al. Detec-ting backdoor attacks on deep neural networks by activa-tion clustering[J]. arXiv:1811.03728, 2018. [56] PERI N, GUPTA N, HUANG W R, et al. Deep k-NN defense against clean-label data poisoning attacks[C]//LNCS 12535: Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 55-70. [57] LIU Y, XIE Y, SRIVASTAVA A. Neural trojans[C]//Procee-dings of the 2017 IEEE International Conference on Com-puter Design, Boston, Nov 5-8, 2017. Washington: IEEE Computer Society, 2017: 45-48. [58] DOAN B G, ABBASNEJAD E, RANASINGHE D C. Feb-ruus: input purification defense against trojan attacks on deep neural network systems[C]//Proceedings of the Annual Computer Security Applications Conference, Austin, Dec 7-11, 2020. New York: ACM, 2020: 897-912. [59] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE Inter-national Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626. [60] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. [61] VILLARREAL-VASQUEZ M, BHARGAVA B. ConFoc: content-focus protection against trojan attacks on neural networks[J]. arXiv:2007.00711, 2020. [62] LI Y, ZHAI T, JIANG Y, et al. Backdoor attack in the phy-sical world[J]. arXiv:2104.02361, 2021. [63] QIU H, ZENG Y, GUO S, et al. DeepSweep: an evaluation framework for mitigating DNN backdoor attacks using data augmentation[C]//Proceedings of the 2021 ACM Asia Con-ference on Computer and Communications Security, Hong Kong, China, Jun 7-11, 2021. New York: ACM, 2021: 363-377. [64] LIU K, DOLAN-GAVITT B, GARG S. Fine-pruning: de-fending against backdooring attacks on deep neural net-works[C]//LNCS 11050: Proceedings of the 21st Interna-tional Symposium Research in Attacks, Intrusions, and De-fenses, Heraklion, Sep 10-12, 2018. Cham: Springer, 2018: 273-294. [65] WANG B, YAO Y, SHAN S, et al. Neural cleanse: identif-ying and mitigating backdoor attacks in neural networks[C]//Proceedings of the 2019 IEEE Symposium on Security and Privacy, San Francisco, May 19-23, 2019. Piscataway: IEEE, 2019: 707-723. [66] LIU Y, LEE W C, TAO G, et al. ABS: scanning neural net-works for back-doors by artificial brain stimulation[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, Nov 11-15, 2019. New York: ACM, 2019: 1265-1282. [67] XIE C, WANG J, ZHANG Z, et al. Mitigating adversarial effects through randomization[J]. arXiv:1711.01991, 2017. [68] 中国电动汽车百人会, 腾讯自动驾驶, 中汽中心. 中国自动驾驶仿真蓝皮书[EB/OL]. [2022-09-20]. https://case.valuepr. net/file/1012_blue_paper.pdf. China EV100, Tencent Autonomous Driving, CATARC. China autonomous driving simulation blue paper[EB/OL]. [2022-09-20]. https://case.valuepr.net/file/1012_blue_ paper.pdf. [69] TRAMER F, CARLINI N, BRENDEL W, et al. On adaptive attacks to adversarial example defenses[C]//Proceedings of the Annual Conference on Neural Information Processing Systems 2020, Dec 6-12, 2020: 1633-1645. |
[1] | 孙家泽, 唐彦梅, 王曙燕. 利用GAN和特征金字塔的模型鲁棒性优化方法[J]. 计算机科学与探索, 2023, 17(5): 1139-1146. |
[2] | 李玉轩, 洪学海, 汪洋, 唐正正, 班艳. 引入激活加权策略的分组排序学习方法[J]. 计算机科学与探索, 2022, 16(7): 1594-1602. |
[3] | 裴利沈, 赵雪专. 群体行为识别深度学习方法研究综述[J]. 计算机科学与探索, 2022, 16(4): 775-790. |
[4] | 刘利平, 乔乐乐, 蒋柳成. 图像去噪方法概述[J]. 计算机科学与探索, 2021, 15(8): 1418-1431. |
[5] | 武晓栋, 刘敬浩, 金杰, 毛思平. 基于DT及PCA的DNN入侵检测模型[J]. 计算机科学与探索, 2021, 15(8): 1450-1458. |
[6] | 沈学利, 秦鑫宇. 密度Canopy的增强聚类与深度特征的KNN算法[J]. 计算机科学与探索, 2021, 15(7): 1289-1301. |
[7] | 徐辉, 祝玉华, 甄彤, 李智慧. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59. |
[8] | 林阳,初旭,王亚沙,毛维嘉,赵俊峰. 融合自注意力机制的跨模态食谱检索方法[J]. 计算机科学与探索, 2020, 14(9): 1471-1481. |
[9] | 李俊杰,王茜. 感知相似的图像分类对抗样本生成模型[J]. 计算机科学与探索, 2020, 14(11): 1930-1942. |
[10] | 张涛,任相赢,刘阳,耿彦章. 基于自编码特征的语音增强声学特征提取[J]. 计算机科学与探索, 2019, 13(8): 1341-1350. |
[11] | 徐毅,董晴,戴鑫,宋威. ELM优化的深度自编码分类算法[J]. 计算机科学与探索, 2018, 12(5): 820-827. |
[12] | 王毅,冯小年,钱铁云,朱辉,周 静. 基于CNN和LSTM深度网络的伪装用户入侵检测[J]. 计算机科学与探索, 2018, 12(4): 575-585. |
[13] | 张威,周治平. 融合注意力和动态语义指导的图像描述模型[J]. 计算机科学与探索, 2017, 11(12): 2033-2040. |
[14] | 齐学梅,汤其妹,陈付龙,杨洁,叶和平. 有限域上模逆电路的可逆逻辑设计[J]. 计算机科学与探索, 2015, 9(5): 555-564. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||