图像对抗样本防御技术研究综述

doi:10.3778/j.issn.1673-9418.2303080

摘要/Abstract

摘要： 人工智能的快速发展和广泛应用带来了新的安全性问题，针对深度神经网络的对抗样本生成与防御是其中的热点之一。深度神经网络在图像领域应用最广也最容易被图像对抗样本欺骗，针对图像对抗样本的防御技术研究是提升人工智能应用安全的重要手段。图像对抗样本的存在原因尚无统一解释，但可从不同维度加以观察与理解，从而为提出针对性的防御技术方法提供启示。对当前主流的盲区假说、线性假说、决策边界假说、特征假说等对抗样本存在原因假说，以及各种假说与典型对抗样本生成方法之间的关联关系进行了梳理分析。以此为基础，从基于模型和基于数据两个维度对图像对抗样本防御技术进行了总结归纳，对比分析了不同技术方法的适应场景与优缺点。现有的图像对抗样本防御技术方法大多针对具体的对抗样本生成方法进行防御，尚无统一的防御理论与方法。现实应用中需综合考虑具体的应用场景、潜在的安全风险等，在现有的防御技术方法中进行优化组合配置。后续可从泛化防御理论、防御效果评价、体系化防护策略等方面深化技术研究。

关键词: 对抗样本, 人工智能安全, 对抗防御

Abstract: The rapid and extensive growth of artificial intelligence introduces new security challenges. The generation and defense of adversarial examples for deep neural networks is one of the hot spots. Deep neural networks are most widely used in the field of images and most easily cheated by image adversarial examples. The research on the defense techniques for image adversarial examples is an important tool to improve the security of AI applications. There is no standard explanation for the existence of image adversarial examples, but it can be observed and understood from different dimensions, which can provide insights for proposing targeted defense approaches. This paper sorts out and analyzes current mainstream hypotheses of the reason for the existence of adversarial examples, such as the blind spot hypothesis, linear hypothesis, decision boundary hypothesis, and feature hypothesis, and the correlations between various hypotheses and typical adversarial example generation methods. Based on this, this paper summarizes the image adversarial example defense techniques in two dimensions, model-based and data-based, and compares and analyzes the adaptation scenarios, advantages and disadvantages of different technical methods. Most of the existing image adversarial example defense techniques are aimed at defending against specific adversarial example generation methods, and there is no universal defense theory and method yet. In the real application, it needs to consider the specific application scenarios, potential security risks and other factors, optimize and combine the configuration in the existing defense methods. Future researchers can deepen their technical research in terms of generalized defense theory, evaluation of defense effectiveness, and systematic protection strategies.

Key words: adversarial examples, artificial intelligence security, adversarial defense

刘瑞祺, 李虎, 王东霞, 赵重阳, 李博宇. 图像对抗样本防御技术研究综述[J]. 计算机科学与探索, 2023, 17(12): 2827-2839.

LIU Ruiqi, LI Hu, WANG Dongxia, ZHAO Chongyang, LI Boyu. Survey of Image Adversarial Example Defense Techniques[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(12): 2827-2839.

参考文献

[1] AMODEI D, OLAH C, STEINHARDT J, et al. Concrete pro-blems in AI safety[J]. arXiv:1606.06565, 2016.
[2] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[J]. arXiv:1312.6199, 2013.
[3] ZHANG Q, HU S, SUN J, et al. On adversarial robustness of trajectory prediction for autonomous vehicles[C]//Pro-ceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 15159-15168.
[4] ALI W, QURESHI E, FAROOQI O A, et al. Pneumonia de-tection in chest X-ray images: handling class imbalance[J]. arXiv:2301.08479, 2023.
[5] TIAN Y, NI Z, CHEN B, et al. Just noticeable difference modeling for face recognition system[J]. arXiv:2209.05856, 2022.
[6] WIYATNO R R, XU A, DIA O, et al. Adversarial examples in modern machine learning: a review[J]. arXiv:1911.05268, 2019.
[7] 白祉旭, 王衡军, 郭可翔. 基于深度神经网络的对抗样本技术综述[J]. 计算机工程与应用, 2021, 57(23): 61-70.
BAI Z X, WANG H J, GUO K X. Summary of adversarial examples techniques based on deep neural networks[J]. Com-puter Engineering and Applications, 2021, 57(23): 61-70.
[8] 张田, 杨奎武, 魏江宏, 等. 面向图像数据的对抗样本检测与防御技术综述[J]. 计算机研究与发展, 2022, 59(6):1315-1328.
ZHANG T, YANG G W, WEI J H, et al. Survey on detec-ting and defending adversarial examples for image data[J]. Journal of Computer Research and Development, 2022, 59(6): 1315-1328.
[9] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv:1412.6572, 2014.
[10] KURAKIN A, GOODFELLOW I J, BENGIO S. Adversarial examples in the physical world[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-15.
[11] TRAMèR F, KURAKIN A, PAPERNOT N, et al. Ensemble adversarial training: attacks and defenses[J]. arXiv:1705.07204, 2017.
[12] DONG Y, LIAO F, PANG T, et al. Boosting adversarial attacks with momentum[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 9185-9193.
[13] SCHWINN L, NGUYEN A, RAAB R, et al. Dynamically sampled nonlocal gradients for stronger adversarial attacks[C]//Proceedings of the 2021 International Joint Conference on Neural Networks, Shenzhen, Jul 18-22, 2021. Piscataway: IEEE, 2021: 1-8.
[14] PAPERNOT N, MCDANIEL P, WU X, et al. Distillation as a defense to adversarial perturbations against deep neural networks[C]//Proceedings of the 2016 IEEE Symposium on Security and Privacy, California, May 23-25, 2016. Piscataway: IEEE, 2016: 582-597.
[15] CARLINI N, WAGNER D. Towards evaluating the robust-ness of neural networks[C]//Proceedings of the 2017 IEEE Symposium on Security and Privacy, San Jose, May 22-26, 2017. Washington: IEEE Computer Society, 2017: 39-57.
[16] MOOSAVI-DEZFOOLI S M, FAWZI A, FROSSARD P. DeepFool: a simple and accurate method to fool deep neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2574-2582.
[17] MOOSAVI-DEZFOOLI S M, FAWZI A, FAWZI O, et al. Universal adversarial perturbations[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 1765-1773.
[18] ILYAS A, SANTURKAR S, TSIPRAS D, et al. Adversarial examples are not bugs, they are features[C]//Advances in Neural?Information?Processing?Systems?32,?Vancouver, Dec 8-14, 2019: 125-136.
[19] WANG Z, GUO H, ZHANG Z, et al. Feature importance-aware transferable adversarial attacks[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 7639-7648.
[20] FEINMAN R, CURTIN R R, SHINTRE S, et al. Detecting adversarial samples from artifacts[J]. arXiv:1703.00410, 2017.
[21] PAPERNOT N, MCDANIEL P, JHA S, et al. The limita-tions of deep learning in adversarial settings[C]//Procee-dings of the 2016 IEEE European Symposium on Security and Privacy, Saarbrücken, Mar 21-24, 2016. Piscataway: IEEE, 2016: 372-387.
[22] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv:1706.06083, 2017.
[23] ATHALYE A, CARLINI N, WAGNER D. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples[C]//Proceedings of the 2018 International Conference on Machine Learning, Stockholm, Jul 10-15, 2018. New York: ACM, 2018: 274-283.
[24] UESATO J, O??DONOGHUE B, KOHLI P, et al. Adver-sarial risk and the dangers of evaluating against weak attacks[C]//Proceedings of the 2018 International Conference on Machine Learning, Stockholm, Jul 10-15, 2018. New York: ACM, 2018: 5025-5034.
[25] INKAWHICH N, LIANG K J, CARIN L, et al. Trans-ferable perturbations of deep feature distributions[J]. arXiv:2004.12519, 2020.
[26] SHI C, HOLTZ C, MISHNE G. Online adversarial purifica-tion based on self-supervision[J]. arXiv:2101.09387, 2021.
[27] ANDRIUSHCHENKO M, CROCE F, FLAMMARION N, et al. Square attack: a query-efficient black-box adversarial attack via random search[C]//LNCS 12368: Proceedings of the 2020 European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 484-501.
[28] BRENDEL W, RAUBER J, BETHGE M. Decision-based adversarial attacks: reliable attacks against black-box machine learning models[J]. arXiv:1712.04248, 2017.
[29] WANG D, LIN J, WANG Y G. Query-efficient adversarial attack based on Latin hypercube sampling[C]//Proceedings of the 2022 IEEE International Conference on Image Pro-cessing, Bordeaux, Oct 16-19, 2022. Piscataway: IEEE, 2022: 546-550.
[30] PAPERNOT N, MCDANIEL P, GOODFELLOW I, et al. Practical black-box attacks against machine learning[C]//Proceedings of the 2017 ACM Asia Conference on Com-puter and Communications Security, Abu Dhabi, Apr 2-6, 2017. New York: ACM, 2017: 506-519.
[31] NASEER M, KHAN S H, RAHMAN S, et al. Task-gene-ralizable adversarial attack based on perceptual metric[J]. arXiv:1811.09020, 2018.
[32] GANESHAN A, BS V, BABU R V. FDA: feature disruptive attack[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 8069-8079.
[33] ZHANG J, WU W, HUANG J, et al. Improving adversarial transferability via neuron attribution-based attacks[C]//Pro-ceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 14993-15002.
[34] SHAFAHI A, NAJIBI M, GHIASI A, et al. Adversarial training for free![C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 3353-3364.
[35] TSIPRAS D, SANTURKAR S, ENGSTROM L, et al. Robust-ness may be at odds with accuracy[J]. arXiv:1805.12152, 2018.
[36] ZHANG H, YU Y, JIAO J, et al. Theoretically principled trade-off between robustness and accuracy[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 7472-7482.
[37] CHEN T, ZHANG Z, LIU S, et al. Robust overfitting may be mitigated by properly learned smoothening[C]//Procee-dings of the 9th International Conference on Learning Re-presentations, Austria, May 3-7, 2021: 1-19.
[38] JIA X, ZHANG Y, WU B, et al. LAS-AT: adversarial trai-ning with learnable attack strategy[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 13398-13408.
[39] LIU A, LIU X, YU H, et al. Training robust deep neural networks via adversarial noise propagation[J]. IEEE Tran-sactions on Image Processing, 2021, 30: 5769-5781.
[40] ZHANG H, CISSE M, DAUPHIN Y N, et al. mixup: beyond empirical risk minimization[J]. arXiv:1710.09412, 2017.
[41] VERMA V, LAMB A, BECKHAM C, et al. Manifold mixup: better representations by interpolating hidden states[C]//Pro-ceedings of the 2019 International Conference on Machine Learning, Long Beach, Jun 9-15, 2019. New York: ACM, 2019: 6438-6447.
[42] GU S, RIGAZIO L. Towards deep neural network architec-tures robust to adversarial examples[J]. arXiv:1412.5068, 2014.
[43] MUTHUKUMAR R, SULAM J. Adversarial robustness of sparse local lipschitz predictors[J]. arXiv:2202.13216, 2022.
[44] ZHAO P, CHEN P Y, DAS P, et al. Bridging mode con-nectivity in loss landscapes and adversarial robustness[J]. arXiv:2005.00060, 2020.
[45] MOK J, NA B, CHOE H, et al. AdvRush: searching for adversarially robust neural architectures[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Oct 10-17, 2021. Piscataway: IEEE, 2021: 12302-12312.
[46] MOOSAVI-DEZFOOLI S M, FAWZI A, UESATO J, et al. Robustness via curvature regularization, and vice versa[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 9078-9086.
[47] HENDRYCKS D，GIMPEL K. Early methods for detecting adversarial images[J]. arXiv:1608.00530, 2016.
[48] LEVER J, KRZYWINSKI M, ALTMAN N. Points of signi-ficance: principal component analysis[J]. Nature Methods, 2017, 14(7): 641-643.
[49] METZEN J H, GENEWEIN T, FISCHER V, et al. On detec-ting adversarial perturbations[J]. arXiv:1702.04267, 2017.
[50] XIE C, WANG J, ZHANG Z, et al. Mitigating adversarial effects through randomization[J]. arXiv:1711.01991, 2017.
[51] DUBEY A, MAATEN L, YALNIZ Z, et al. Defense against adversarial images using web-scale nearest-neighbor search[C]//Proceedings of the 2019 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition, Long Beach, Jun 15-20, 2019. Piscataway: IEEE, 2019: 8767-8776.
[52] LIAO F, LIANG M, DONG Y, et al. Defense against adve-rsarial attacks using high-level representation guided denoi-ser[C]//Proceedings of the 2018 IEEE Conference on Com-puter Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Washington: IEEE Computer Society, 2018: 1778-1787.
[53] MENG D, CHEN H. MagNet: a two-pronged defense agai-nst adversarial examples[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, Oct 30-Nov 3, 2017. New York: ACM, 2017: 135-147.
[54] SAMANGOUEI P, KABKAB M, CHELLAPPA R. Defense-GAN: protecting classifiers against adversarial attacks using generative models[J]. arXiv:1805.06605, 2018.
[55] NIE W, GUO B, HUANG Y, et al. Diffusion models for adversarial purification[J]. arXiv:2205.07460, 2022.