计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (9): 2487-2500.DOI: 10.3778/j.issn.1673-9418.2307013

• 人工智能·模式识别 • 上一篇    

融合全局增强-局部注意特征的表情识别网络

刘娟,王颖,胡敏,黄忠   

  1. 1. 安庆师范大学 电子工程与智能制造学院,安徽 安庆 246133
    2. 合肥工业大学 计算机与信息学院 情感计算与先进智能机器安徽省重点实验室,合肥 230009
  • 出版日期:2024-09-01 发布日期:2024-09-01

Fusion of Global Enhancement and Local Attention Features for Expression Recognition Network

LIU Juan, WANG Ying, HU Min, HUANG Zhong   

  1. 1. School of Electronic Engineering and Intelligent Manufacturing, Anqing Normal University, Anqing, Anhui 246133, China
    2. Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, School of Computer and Information, Hefei University of Technology, Hefei 230009, China
  • Online:2024-09-01 Published:2024-09-01

摘要: 为抑制自然场景下遮挡和姿态变化等因素对人脸表情识别的影响,提出一种融合全局增强-局部注意特征(GE-LA)的表情识别网络。为获取增强的全局上下文信息,构建通道-空间全局特征增强结构,该结构采用通道流模块(CFM)和空间流模块(SFM),分别获取对称多尺度通道语义以及像素级空间语义,并结合两类语义生成全局增强特征;为抽取局部细节特征,将高效通道注意力(ECA)机制改进为通道-空间注意力(CSA)机制,并以此构建局部注意模块(LAM)获取通道和空间高级语义。为提升网络对遮挡、姿态变化等因素的抗干扰能力,设计一种自适应策略实现全局增强特征和局部注意特征的加权融合,并基于自适应融合特征实现表情分类。在自然场景人脸表情数据集RAF-DB和FERPlus上的实验结果表明,提出网络的表情识别率分别为89.82%和89.93%,比基线网络ResNet50分别提高了13.39个百分点和10.62个百分点。与相关方法相比,提出方法降低了遮挡、姿态变化的影响,在自然场景下具有较好的表情识别效果。

关键词: 人脸表情识别, 全局增强特征, 局部注意特征, 自适应融合策略

Abstract: To suppress the effects such as occlusions and posture variations on facial expression recognition in natural scenes, expression recognition network fusing global enhancement and local attention features (GE-LA) is proposed. Firstly, to acquire the enhanced global context information, an enhancement structure of channel-spatial global features is constructed, which uses channel flow module (CFM) and spatial flow module (SFM) to obtain symmetric multi-scale channel semantics and pixel-level spatial semantics, respectively, and combines these two types of semantics to generate global enhanced features. Secondly, to extract local detail features, an efficient channel attention (ECA) mechanism is improved to channel-spatial attention (CSA) mechanism, and a local attention module (LAM) is constructed based on this to obtain channel and spatial high-level semantics. Finally, to enhance the anti-interference ability of the proposed network against factors such as occlusions and posture variations, an adaptive strategy is designed to obtain the weighted fusion of global enhancement features and local attention features, and to achieve expression classification based on the adaptive fusion features. Experimental results on facial expression datasets RAF-DB and FERPlus in natural scenes show that the expression recognition rates of the proposed network are 89.82% and 89.93%, respectively, which are 13.39 percentage points and 10.62 percentage points higher than the baseline network ResNet50. Compared with the related methods, the proposed method, which reduces the influence of occlusions and posture variations, has a better expression recognition performance in natural scenes.

Key words: facial expression recognition, global enhancement features, local attention features, adaptive fusion strategy