融合全局增强-局部注意特征的表情识别网络

doi:10.3778/j.issn.1673-9418.2307013

摘要/Abstract

摘要： 为抑制自然场景下遮挡和姿态变化等因素对人脸表情识别的影响，提出一种融合全局增强-局部注意特征（GE-LA）的表情识别网络。为获取增强的全局上下文信息，构建通道-空间全局特征增强结构，该结构采用通道流模块（CFM）和空间流模块（SFM），分别获取对称多尺度通道语义以及像素级空间语义，并结合两类语义生成全局增强特征；为抽取局部细节特征，将高效通道注意力（ECA）机制改进为通道-空间注意力（CSA）机制，并以此构建局部注意模块（LAM）获取通道和空间高级语义。为提升网络对遮挡、姿态变化等因素的抗干扰能力，设计一种自适应策略实现全局增强特征和局部注意特征的加权融合，并基于自适应融合特征实现表情分类。在自然场景人脸表情数据集RAF-DB和FERPlus上的实验结果表明，提出网络的表情识别率分别为89.82%和89.93%，比基线网络ResNet50分别提高了13.39个百分点和10.62个百分点。与相关方法相比，提出方法降低了遮挡、姿态变化的影响，在自然场景下具有较好的表情识别效果。

关键词: 人脸表情识别, 全局增强特征, 局部注意特征, 自适应融合策略

Abstract: To suppress the effects such as occlusions and posture variations on facial expression recognition in natural scenes, expression recognition network fusing global enhancement and local attention features (GE-LA) is proposed. Firstly, to acquire the enhanced global context information, an enhancement structure of channel-spatial global features is constructed, which uses channel flow module (CFM) and spatial flow module (SFM) to obtain symmetric multi-scale channel semantics and pixel-level spatial semantics, respectively, and combines these two types of semantics to generate global enhanced features. Secondly, to extract local detail features, an efficient channel attention (ECA) mechanism is improved to channel-spatial attention (CSA) mechanism, and a local attention module (LAM) is constructed based on this to obtain channel and spatial high-level semantics. Finally, to enhance the anti-interference ability of the proposed network against factors such as occlusions and posture variations, an adaptive strategy is designed to obtain the weighted fusion of global enhancement features and local attention features, and to achieve expression classification based on the adaptive fusion features. Experimental results on facial expression datasets RAF-DB and FERPlus in natural scenes show that the expression recognition rates of the proposed network are 89.82% and 89.93%, respectively, which are 13.39 percentage points and 10.62 percentage points higher than the baseline network ResNet50. Compared with the related methods, the proposed method, which reduces the influence of occlusions and posture variations, has a better expression recognition performance in natural scenes.

Key words: facial expression recognition, global enhancement features, local attention features, adaptive fusion strategy

刘娟, 王颖, 胡敏, 黄忠. 融合全局增强-局部注意特征的表情识别网络[J]. 计算机科学与探索, 2024, 18(9): 2487-2500.

LIU Juan, WANG Ying, HU Min, HUANG Zhong. Fusion of Global Enhancement and Local Attention Features for Expression Recognition Network[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2487-2500.

参考文献

[1] TONG X Y, SUN S L, FU M X. Adaptive weight based on overlapping blocks network for facial expression recognition[J]. Image and Vision Computing, 2022, 120: 104399.
[2] ZHANG Z Y, SUN X, LI J, et al. MAN: mining ambiguity and noise for facial expression recognition in the wild[J]. Pattern Recognition Letters，2022, 164: 23-29.
[3] 戎如意, 薛珮芸, 白静, 等. 双通道决策信息融合下的微表情识别[J]. 西安电子科技大学学报, 2022, 49(4): 127-133.
RONG R Y, XUE P Y, BAI J, et al. Micro-expression recognition based on two-channel decision information fusion[J]. Journal of Xidian University, 2022, 49(4): 127-133.
[4] 程艳, 蔡壮, 吴刚, 等. 结合自注意力特征过滤分类器和双分支GAN的面部表情识别[J]. 模式识别与人工智能, 2022, 35(3): 243-253.
CHENG Y, CAI Z, WU G, et al. Facial expression recognition combining self-attention feature filtering classifier and two-branch GAN[J]. Pattern Recognition and Artificial Intelligence, 2022, 35(3): 243-253.
[5] 胡敏, 胡鹏远, 葛鹏, 等. 基于面部运动单元和时序注意力机制的视频表情识别方法[J]. 计算机辅助设计与图形学学报, 2023, 35(1): 108-117.
HU M, HU P Y, GE P, et al. Video expression recognition method based on facial motion unit and temporal attention[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(1): 108-117.
[6] LIU C, HIROTA K, DAI Y P. Patch attention convolutional vision transformer for facial expression recognition with occlusion[J]. Information Sciences, 2023, 619: 781-794.
[7] MA F Y, SUN B, LI S. Facial expression recognition with visual transformers and attentional selective fusion[J]. IEEE Transactions on Affective Computing, 2023, 14(2): 1236-1248.
[8] 夏鸿斌, 李强, 刘渊. 局部与全局特征融合的方面情感分析网络模型[J]. 计算机科学与探索, 2023, 17(4): 902-911.
XIA H B, LI Q, LIU Y. Local and global feature fusion network model for aspect-based sentiment analysis[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 902-911.
[9] PAN B W, WANG S F, XIA B. Occluded facial expression recognition enhanced through privileged information[C]// Proceedings of the 27th ACM International Conference on Multimedia, Nice, Oct 21-25, 2019. New York: ACM, 2019:566-573.
[10] NI R, YANG B, ZHOU X, et al. Facial expression recognition through cross-modality attention fusion[J]. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15(1): 175-185.
[11] 陈昌川, 王海宁, 黄炼, 等. 一种基于局部表征的面部表情识别算法[J]. 西安电子科技大学学报, 2021, 48(5): 100-109.
CHEN C C, WANG H N, HUANG L, et al. Facial expression recognition based on local representation[J]. Journal of Xidian University, 2021, 48(5): 100-109.
[12] WANG K, PENG X J, YANG J F, et al. Region attention networks for pose and occlusion robust facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069.
[13] JI Y L, HU Y H, YANG Y, et al. Region attention enhanced unsupervised cross-domain facial emotion recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(4): 4190-4201.
[14] 唐宏, 向俊玲, 陈海涛, 等. 多区域融合轻量级人脸表情识别网络[J]. 激光与光电子学进展, 2023, 60(6): 71-79.
TANG H, XIANG J L, CHEN H T, et al. Lightweight network based on multiregion fusion for facial expression recognition[J]. Laser & Optoelectronics Progress, 2023, 60(6): 71-79.
[15] LI Y J, LU G M, LI J X, et al. Facial expression recognition in the wild using multi-level features and attention mechanisms[J]. IEEE Transactions on Affective Computing, 2023, 14(1): 451-462.
[16] 祁欣, 袁非牛, 史劲亭, 等. 多层次特征融合网络的语义分割算法[J]. 计算机科学与探索, 2023, 17(4): 922-932.
QI X, YUAN F N, SHI J T, et al. Semantic segmentation algorithm of multi-level feature fusion network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 922-932.
[17] LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism[J]. IEEE Transactions on Image Processing, 2019, 28(5): 2439-2450.
[18] WADHAWAN R, GANDHI T K. Landmark-aware and part-based ensemble transfer learning network for static facial expression recognition from images[J]. IEEE Transactions on Artificial Intelligence, 2023, 4(2): 349-361.
[19] YU M J, ZHENG H C, PENG Z F, et al. Facial expression recognition based on a multi-task global-local network[J]. Pattern Recognition Letters, 2020,131(4): 166-171.
[20] ZHAO Z Q, LIU Q S, WANG S M. Learning deep global multi-scale and local attention features for facial expression recognition in the wild[J]. IEEE Transactions on Image Processing, 2021, 30: 6544-6554.
[21] HUANG Q H, HUANG C Q, WANG X Z, et al. Facial expression recognition with grid-wise attention and visual transformer[J]. Information Sciences, 2021, 580: 35-54.
[22] LIU H W, CAI H L, LIN Q C, et al. Adaptive multilayer perceptual attention network for facial expression recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9): 6253-6265.
[23] XIAO J H, GAN C Q, ZHU Q Y, et al. CFNet: facial expression recognition via constraint fusion under multi-task joint learning network[J]. Applied Soft Computing, 2023, 141: 110312.
[24] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 11531-11539.
[25] QIAN Z Z, MU J, TIAN F. Ventral-dorsal attention capsule network for facial expression recognition[J]. Digital Signal Processing, 2023, 136: 103978.
[26] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 770-778.
[27] CHAUDHARI S, MITHAL V, POLATKAN G, et al. An attention survey of attention models[J]. ACM Transactions on Intelligent Systems and Technology, 2021, 12(5): 1-32.
[28] 张为, 李璞. 基于注意力机制的人脸表情识别网络[J]. 天津大学学报(自然科学与工程技术版), 2022, 55(7): 706-713.
ZHANG W, LI P. Facial expression recognition network based on attention mechanism[J]. Journal of Tianjin University (Science and Technology), 2022, 55(7): 706-713.
[29] LI S, DENG W H, DU J P. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition[J]. IEEE Transactions on Image Processing, 2019, 28(1): 356-370.
[30] BARSOUM E, ZHANG C, FERRER C C, et al. Training deep networks for facial expression recognition with crowd-sourced label distribution[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Nov 12-16, 2016. New York: ACM, 2016: 279-283.
[31] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 618-626.
[32] GERA D, BALASUBRAMANIAN S. Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition[J]. Pattern Recognition Letters, 2021, 145: 58-66.