Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (7): 1479-1503.DOI: 10.3778/j.issn.1673-9418.2112081

• Surveys and Frontiers • Previous Articles     Next Articles

Survey of Deep Learning Based Multimodal Emotion Recognition

ZHAO Xiaoming1,2,+(), YANG Yijiao1, ZHANG Shiqing2   

  1. 1. School of Science, Zhejiang University of Science and Technology, Hangzhou 310000, China
    2. Institute of Intelligent Information Processing, Taizhou University, Taizhou, Zhejiang 318000, China
  • Received:2021-12-20 Revised:2022-02-14 Online:2022-07-01 Published:2022-03-09
  • Supported by:
    the Natural Science Foundation of Zhejiang Province(LZ20F020002);the National Natural Science Foundation of China(61976149)


赵小明1,2,+(), 杨轶娇1, 张石清2   

  1. 1.浙江科技学院 理学院,杭州 310000
    2.台州学院 智能信息处理研究所,浙江 台州 318000
  • 作者简介:赵小明(1964—),男,浙江临海人,硕士,教授,主要研究方向为音频和图像处理、机器学习、模式识别等。
    ZHAO Xiaoming, born in 1964, M.S., professor. His research interests include audio and image processing, machine learning, pattern recogni-tion, etc.
    YANG Yijiao, born in 1997, M.S. candidate. Her research interests include emotional computing, pattern recognition, etc.
    ZHANG Shiqing, born in 1980, Ph.D., professor. His research interests include emotional compu-ting, pattern recognition, etc.
  • 基金资助:


Multimodal emotion recognition aims to recognize human emotional states through different modalities related to human emotion expression such as audio, vision, text, etc. This topic is of great importance in the fields of human-computer interaction, a.pngicial intelligence, affective computing, etc., and has attracted much attention. In view of the great success of deep learning methods developed in recent years in various tasks, a variety of deep neural networks have been used to learn high-level emotional feature representations for multimodal emotion recog-nition. In order to systematically summarize the research advance of deep learning methods in the field of multi-modal emotion recognition, this paper aims to present comprehensive analysis and summarization on recent multi-modal emotion recognition literatures based on deep learning. First, the general framework of multimodal emotion recognition is given, and the commonly used multimodal emotional dataset is introduced. Then, the principle of representative deep learning techniques and its advance in recent years are briefly reviewed. Subsequently, this paper focuses on the advance of two key steps in multimodal emotion recognition: emotional feature extraction methods related to audio, vision, text, etc., including hand-crafted feature extraction and deep feature extraction; multi-modal information fusion strategies integrating different modalities. Finally, the challenges and opportunities in this field are analyzed, and the future development direction is pointed out.

Key words: emotion recognition, multimodal, deep learning, hand-crafted feature, deep feature, fusion



关键词: 情感识别, 多模态, 深度学习, 手工特征, 深度特征, 融合

CLC Number: