[1] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1312.6199.
[2] GOODFELLOW I J, SHLENS J, SZEGEDY C, et al. Explaining and harnessing adversarial examples[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1412.6572.
[3] MOOSAVI-DEZFOOLI S M, FAWZI A, FROSSARD P. DeepFool: a simple and accurate method to fool deep neural networks[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2574-2582.
[4] CARLINI N, WAGNER D. Towards evaluating the robustness of neural networks[C]//Proceedings of the 2017 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2017: 39-57.
[5] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 4171-4186.
[6] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, 2020: 1877-1901.
[7] JIN D, JIN Z J, ZHOU J T, et al. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8018-8025.
[8] VITORINO J, MAIA E, PRA?A I. Adversarial evasion attack efficiency against large language models[EB/OL]. [2024-07-18]. https://arxiv.org/abs/ 2406.08050.
[9] XU X, KONG K, LIU N, et al. An LLM can fool itself: a prompt-based adversarial attack[EB/OL]. [2024-07-18]. https://arxiv.org/abs/ 2310.13345.
[10] MIYATO T, DAI A M, GOODFELLOW I J. Adversarial training methods for semi-supervised text classification[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1605.07725.
[11] LI J, JI S, DU T, et al. TextBugger: generating adversarial text against real-world applications[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1812.05271.
[12] WIYATNO R R, XU A, DIA O A, et al. Adversarial examples in modern machine learning: a review[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1911.05268.
[13] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[EB/OL]. [2024-07-18]. https://arxiv.org/abs/1706.06083.
[14] AKHTAR N, MIAN A, KARDAN N, et al. Advances in adversarial attacks and defenses in computer vision: a survey[J]. IEEE Access, 2021, 9: 155161-155196.
[15] RAINA V, TAN S, CEVHER V, et al. Extreme miscalibration and the illusion of adversarial robustness[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2024: 2500-2525.
[16] BAO R, ZHENG R, DING L, et al. CASN: class-aware score network for textual adversarial detection[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 671-687.
[17] YOO K, KIM J, JANG J, et al. Detection of adversarial examples in text classification: benchmark and baseline via robust density estimation[C]//Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg: ACL, 2022: 3656-3672.
[18] ZHOU Y C, JIANG J Y, CHANG K W, et al. Learning to discriminate perturbations for blocking adversarial attacks in text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 4904-4913.
[19] MOZES M, STENETORP P, KLEINBERG B, et al. Frequency-guided word substitutions for detecting textual adversarial examples[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2021: 171-186.
[20] LIU N, DRAS M, ZHANG W E. Detecting textual adversarial examples based on distributional characteristics of data representations[C]//Proceedings of the 7th Workshop on Representation Learning for NLP. Stroudsburg: ACL, 2022: 78-90.
[21] MOSCA E, AGARWAL S, RANDO RAMíREZ J, et al. “That is a suspicious reaction!”: interpreting logits variation to detect NLP adversarial attacks[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 7806-7816.
[22] HADSELL R, CHOPRA S, LECUN Y. Dimensionality reduction by learning an invariant mapping[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2006: 1735-1742.
[23] REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using siamese BERT-networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3982-3992.
[24] GU A, DAO T M. Mamba: linear-time sequence modeling with selective state spaces[EB/OL]. [2024-07-19]. https://arxiv.org/abs/2312.00752.
[25] REN S H, DENG Y H, HE K, et al. Generating natural language adversarial examples through probability weighted word saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 1085-1097.
[26] LI L Y, MA R T, GUO Q P, et al. BERT-ATTACK: adversarial attack against BERT using BERT[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 6193-6202.
[27] HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9726-9735.
[28] TUNSTALL L, REIMERS N, JO U E, et al. Efficient few-shot learning without prompts[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 3638-3652.
[29] OHASHI S, TAKAYAMA J, KAJIWARA T, et al. Text classification with negative supervision[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 351-357.
[30] ROBINSON J, CHUANG C Y, SRA S, et al. Contrastive learning with hard negative samples[EB/OL]. [2024-07-19]. https://arxiv.org/abs/2010.04592.
[31] ZANTEDESCHI V, NICOLAE M I, RAWAT A. Efficient defenses against adversarial attacks[EB/OL]. [2024-07-19]. https://arxiv.org/abs/1707.06728.
[32] LIU X Q, CHENG M H, ZHANG H, et al. Towards robust neural networks via random self-ensemble[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 381-397.
[33] LIU X Q, XIAO T S, SI S, et al. How does noise help robustness? Explanation and exploration under the neural SDE framework[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 279-287.
[34] SONG Y, ERMON S. Generative modeling by estimating gradients of the data distribution[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2019: 1067.
[35] KIRKPATRICK S, GELATT C D, VECCHI M P. Optimization by simulated annealing[J]. Science, 1983, 220(4598): 671-680.
[36] KINGMA D P, WELLING M. Auto-encoding variational Bayes[C]//Proceedings of the 2nd International Conference on Learning Representations, 2014.
[37] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning, 2008: 1096-1103.
[38] GUNEL B, DU J, CONNEAU A, et al. Supervised contrastive learning for pre-trained language model fine-tuning[EB/OL]. [ 2024-07-19]. https://arxiv.org/abs/2011.01403.
[39] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C]//Advances in Neural Information Processing Systems 33, 2020: 18661-18673.
[40] ZENG G Y, QI F C, ZHOU Q R, et al. OpenAttack: an open-source textual adversarial attack toolkit[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Stroudsburg: ACL, 2021: 363-371.
[41] CER D, YANG Y F, KONG S Y, et al. Universal sentence encoder for English[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg: ACL, 2018: 169-174.
[42] MCINNES L, HEALY J, MELVILLE J. UMAP: uniform manifold approximation and projection for dimension reduction[EB/OL]. [2024-07-19]. https://arxiv.org/abs/1802.03426.
[43] LUNDBERG S, LEE S. A unified approach to interpreting model predictions[EB/OL]. [2024-07-19]. https://arxiv.org/abs/1705.07874. |