计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (11): 3094-3107.DOI: 10.3778/j.issn.1673-9418.2502042

• 网络·安全 • 上一篇    下一篇

基于融合特征的元学习对抗样本检测模型

蒋章涛,李欣,薛迪,王晓宇   

  1. 1. 中国人民公安大学 信息网络安全学院,北京 100038
    2. 中国人民公安大学 公安大数据战略研究中心,北京 100038
    3. 安全防范技术与风险评估公安部重点实验室,北京 100026
  • 出版日期:2025-11-01 发布日期:2025-10-30

Meta-Learning Adversarial Example Detection Model Based on Fused Features

JIANG Zhangtao, LI Xin, XUE Di, WANG Xiaoyu   

  1. 1. School of Information and Cyber Security, People??s Public Security University of China, Beijing 100038, China
    2. Public Security Big Data Strategy Research Center, People??s Public Security University of China, Beijing 100038, China
    3. Key Laboratory of Security Technology and Risk Assessment, Ministry of Public Security, Beijing 100026, China
  • Online:2025-11-01 Published:2025-10-30

摘要: 深度学习模型易受对抗攻击的脆弱性使得对抗样本检测成为一项重要技术。现有检测方法通常依赖大量标注数据进行训练,而新型攻击样本的生成速度远超数据收集与标注效率,致使小样本场景下检测性能显著下降。此外,传统端到端的学习方法存在未能充分利用对抗样本固有特征等问题,限制了检测精度和泛化能力。为解决上述问题,提出了一种基于融合特征与注意力机制的元学习对抗样本检测模型Meta FAD。该模型旨在模拟安全专家利用元知识快速适应未知攻击的能力,实现仅依赖少量数据检测新型对抗样本。模型融合空域和频域特征构建联合表征,并通过注意力机制模块对特征权重进行动态调控,从而增强对对抗扰动敏感区域的关注。在元学习框架下采用内外循环优化策略对任务级参数进行更新,以确保模型能够迅速适应新型攻击任务。实验结果显示,Meta FAD在AdvMNIST、AdvFashionMNIST和AdvCIFAR10数据集上实现的2 way 1 shot检测准确率分别为97.42%、89.02%和69.84%,显著优于现有基线模型;消融实验进一步验证了融合特征与注意力机制在提升检测性能方面的关键作用。

关键词: 对抗样本检测, 元学习, 融合特征, 深度学习安全

Abstract: Deep learning models are vulnerable to adversarial attacks, rendering adversarial example detection critical. Existing detection methods rely on large amounts of labeled data for training, while new adversarial examples are generated faster than data collection and annotation, resulting in significant performance degradation in few-shot scenarios. Traditional end-to-end methods fail to fully exploit the inherent features of adversarial examples, which limits detection accuracy and generalization. To address these issues, this paper proposes Meta?FAD, a meta learning based adversarial example detection model that employs fused features and an attention mechanism. Meta?FAD simulates the ability of security experts to rapidly adapt to unknown attacks using meta?knowledge, enabling the detection of new adversarial examples with limited data. The model fuses spatial and frequency domain features to construct a joint representation and dynamically adjusts feature weights with an attention module, thereby enhancing sensitivity to adversarial perturbations. It then updates task?specific parameters using an inner?outer loop optimization strategy within a meta learning framework to rapidly adapt to new attack tasks. Experimental results show that Meta?FAD achieves 2?way 1?shot detection accuracies of 97.42%, 89.02%, and 69.84% on the AdvMNIST, AdvFashionMNIST, and AdvCIFAR10 datasets, respectively, significantly outperforming existing baseline models. Ablation experiments further confirm that feature fusion and  attention mechanism play a key role in improving detection performance.

Key words: adversarial example detection, meta-learning, fused features, deep learning safety