计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (12): 3029-3038.DOI: 10.3778/j.issn.1673-9418.2210079

• 人工智能·模式识别 • 上一篇    下一篇

类别平衡调制的人脸表情识别

刘成广,王善敏,刘青山   

  1. 1. 南京信息工程大学 计算机学院,南京 210044
    2. 南京信息工程大学 数字取证教育部工程研究中心,南京 210044
    3. 南京航空航天大学 计算机科学与技术学院,南京 211106
  • 出版日期:2023-12-01 发布日期:2023-12-01

Class-Balanced Modulation for Facial Expression Recognition

LIU Chengguang, WANG Shanmin, LIU Qingshan   

  1. 1. School of Computer Science, Nanjing University of Information Science & Technology, Nanjing 210044, China
    2. Engineering Research Center of Digital Forensics Ministry of Education, Nanjing University of Information Science & Technology, Nanjing 210044, China
    3. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 人脸表情识别(FER)旨在从人脸图片中判断表情的类别,在心理诊断、人机交互等领域有着广泛的应用前景。在实际任务中,不同表情数据的分布往往是不平衡的。数据的不平衡导致了各表情的特征分布不平衡和分类器优化不平衡,从而影响了表情识别模型的性能。为此,提出了一种类别平衡调制的人脸表情识别方法(CBM-Net),分别在特征学习阶段和分类器优化阶段对模型进行类别平衡调制。CBM-Net包括特征调制和梯度调制两个模块。特征调制模块通过在特征方向上增加类间的可分性与类内的紧密性,实现各类别的特征分布平衡。梯度调制模块利用批次训练样本的统计信息对各分类器的优化梯度进行反向调节,确保各类别的分类器收敛速度一致,使得各分类器性能同时达到最优。在四个流行的数据集上进行的定性和定量实验表明,CBM-Net在人脸表情识别的类别平衡调制上是有效的,与一众先进方法相比,效果也相当良好。

关键词: 人脸表情识别(FER), 类别不平衡, 类别平衡调制, 特征调制, 梯度调制

Abstract: Facial expression recognition (FER) aims at determining the types of facial expressions for given facial images, which has a broad application prospect in psychological diagnosis, human-computer interaction, etc. In practical tasks, various databases tend to have imbalanced data distributions among basic facial expressions. Such an issue has caused imbalanced feature distribution and inconsistent classifier optimization for various facial expressions, seriously affecting the performance of expression recognition models. To solve this issue, this paper proposes a class-balanced modulation mechanism for facial expression recognition (CBM-Net), which attempts to address the imbalanced data distribution problem by modulating the FER model in feature learning and classifier optimization stages. CBM-Net includes two modules of feature modulation and gradient modulation. The feature modulation module struggles to balance feature distributions for all facial expressions by increasing the separability between classes and the tightness within classes in the feature direction. The gradient modulation module uses the statistical information of batch training samples to reversely adjust the optimization gradient of each classifier to ensure that the convergence speed of each classifier is consistent, so that the performance of each classifier can be optimal at the same time. Qualitative and quantitative experiments on four popular datasets show that CBM-Net is effective in class-balanced modulation, and its effect is quite good compared with many advanced methods.

Key words: facial expression recognition (FER), class imbalance, class balance modulation, feature modulation, gradient modulation