计算机科学与探索

• 学术研究 •    下一篇

融合混合注意力机制的局部原型小样本分类模型

周彬, 鲜浩, 秦艺嘉, 周宏斌   

  1. 1. 西南石油大学 理学院,成都 610500
    2. 成都理工大学 数学地质四川省重点实验室,成都 610059
    3. 北京兰云科技有限公司, 北京 100100

A Local Prototypes-based Few-Shot Classification Model Embedded Hybrid Attention

ZHOU Bin,  XIAN Hao,  QIN Yijia,  ZHOU Hongbin   

  1. 1. College of Science, Southwest Petroleum University, Chengdu 610000, China
    2. Key Laboratory of Mathematical Geology of Sichuan Province, Chengdu University of Technology, Chengdu 610059, China
    3. Beijing Lanyun Technology Company Limited, Beijing 100100, China

摘要: 在数据稀缺的新类别识别任务中,基于原型网络的小样本学习已受到广泛关注。针对传统方法中全局均值计算的原型在样本极为受限时易受复杂背景干扰,且局部特征覆盖不足等问题,本文结合局部特征表示和混合注意力机制提出了一种增强原型的方法。首先通过特征提取网络获得样本特征图,并将其分解为局部描述符,经逐元素均值计算生成融合多尺度细节的原型,能够在保留一定的全局信息的同时,捕捉到丰富的细节和上下文关系,提升原型表示能力和抗干扰能力;其次设计了通道-空间分布处理的混合注意力机制,其中双池通道注意力模块采用全局平均池化与最大池化双重路径,动态学习通道特征权重以抑制背景干扰;增强空间注意力模块通过局部特征相似度矩阵评估区域关联性,结合全局平均池化与最大池化提取的通道统计特征,协同生成针对前景目标的空间注意力分布,两者以不同模式聚焦于图像的重要局部区域,从而增强原型的代表性。与已有方法相比,本文方法在不同粒度分类任务上展现了较明显的优势。粗粒度分类任务中,本文方法在两种实验设置下的准确率比基线模型提升了5.60%和6.02%;细粒度分类任务中,本文方法比其它最好方法的准确率平均提高了5.42%

关键词: 小样本学习, 图像分类, 度量学习, 原型网络, 混合注意力机制

Abstract: In few-shot learning tasks for novel class recognition under scarce data, prototype networks have gained significant attention. However, traditional methods relying on global averaging for prototype computation often suffer from background interference in extremely limited samples and insufficient coverage of local features. To address these issues, this paper proposes an enhanced prototype generation approach by integrating local feature representations and a hybrid attention mechanism. First, we extract feature maps of samples through a backbone network and decompose them into local descriptors. Element-wise mean aggregation is then applied to these descriptors, generating prototypes that fuse multi-scale details, thereby preserving global information while capturing rich contextual relationships and improving prototype discriminability and robustness. Second, a hybrid channel-spatial attention mechanism is designed: the dual-pooling channel attention module employs global average pooling and max-pooling pathways to dynamically learn channel-wise weights for suppressing background noise; the enhanced spatial attention module evaluates regional relevance through a local feature similarity matrix, combined with channel statistics derived from global average and max-pooling, to collaboratively generate foreground-targeted spatial attention distributions. The two modules focus on critical local regions of images in complementary ways, enhancing prototype representativeness. Compared with existing methods, our approach demonstrates superior performance in classification tasks across varying granularities. For coarse-grained classification, accuracy improvements of 5.60% and 6.02% are achieved over baseline models under two experimental settings. In fine-grained tasks, the proposed method outperforms state-of-the-art methods by an average margin of 5.42%.

Key words: few-shot learning, image classification, metric learning, prototype networks, hybrid attention mechanism