Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (7): 1479-1503.DOI: 10.3778/j.issn.1673-9418.2112081
• Surveys and Frontiers • Previous Articles Next Articles
ZHAO Xiaoming1,2,+(), YANG Yijiao1, ZHANG Shiqing2
Received:
2021-12-20
Revised:
2022-02-14
Online:
2022-07-01
Published:
2022-03-09
Supported by:
作者简介:
赵小明(1964—),男,浙江临海人,硕士,教授,主要研究方向为音频和图像处理、机器学习、模式识别等。 基金资助:
CLC Number:
ZHAO Xiaoming, YANG Yijiao, ZHANG Shiqing. Survey of Deep Learning Based Multimodal Emotion Recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479-1503.
赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479-1503.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2112081
数据集 | 年份 | 模态 | 简要介绍 | 情感标签 |
---|---|---|---|---|
eNTERFACE’05[ | 2006 | 语音、视觉 | 1 277个视听样本,来自14个不同国家的42名参与者 | 愤怒、厌恶、恐惧、快乐、悲伤、惊讶 |
RML[ | 2008 | 语音、视觉 | 720个由视听情感表达的样本组成,8名参与者 | 愤怒、厌恶、恐惧、幸福、悲伤、惊讶 |
IEMOCAP[ | 2008 | 语音、视觉、姿势、文本 | 10 039段对话:平均持续时间为4.5 s,平均单词数为11.4;10名演员 | 中性、快乐、悲伤、愤怒、惊讶、恐惧、厌恶、沮丧、兴奋等;维度标签:效价、唤醒和支配 |
AFEW[ | 2012 | 语音、视觉 | 1 426个视频片段组成 | 愤怒、厌恶、恐惧、幸福、悲伤、惊讶、中性 |
BAUM-1s[ | 2016 | 语音、视觉 | 1 222个视频样本,31名土耳其受试者 | 快乐、愤怒、悲伤、厌恶、恐惧、惊讶 |
CHEAVD[ | 2016 | 语音、视觉 | 来自电影、电视剧、电视节目的140 min的自发情感片段,238名说话者 | 有26种非原型的情感状态,前8个主要情感为愤怒、快乐、悲伤、担心、焦虑、惊讶、厌恶、中性 |
CMU-MOSI[ | 2016 | 语音、视觉、文本 | 2 199个评论的话语、93段说话者视频 | 消极、积极 |
RAMAS[ | 2018 | 语音、视觉、姿势、生理信号 | 大约7 h的高质量特写视频记录,10位演员 | 愤怒、厌恶、快乐、悲伤、恐惧、惊讶 |
RAVDESS[ | 2018 | 语音、视觉 | 60段演讲,44首歌曲,24位演员 | 中性、平静、快乐、悲伤、愤怒、恐惧、厌恶、惊讶 |
CMU-MOSEI[ | 2018 | 语音、视觉、文本 | 来自1 000多名在线YouTube演讲者的3 837段视频 | 快乐、悲伤、愤怒、恐惧、厌恶、惊讶 |
MELD[ | 2019 | 语音、视觉、文本 | 包含了电视剧 | 愤怒、厌恶、恐惧、喜悦、中立、悲伤、惊讶;正面、负面和中性 |
CH-SIMS[ | 2020 | 语音、视觉、文本 | 2 281个野外视频片段 | 消极、弱消极、中性、弱积极、积极 |
HEU-part1[ | 2021 | 视觉、姿势 | 总共19 004个视频片段,根据数据源分为两部分,共有9 951名受试者 | 愤怒、无聊、困惑、失望、厌恶、恐惧、快乐、中立、悲伤、惊讶 |
HEU-part2[ | 2021 | 语音、视觉、姿势 |
Table 1 Multimodal emotional datasets
数据集 | 年份 | 模态 | 简要介绍 | 情感标签 |
---|---|---|---|---|
eNTERFACE’05[ | 2006 | 语音、视觉 | 1 277个视听样本,来自14个不同国家的42名参与者 | 愤怒、厌恶、恐惧、快乐、悲伤、惊讶 |
RML[ | 2008 | 语音、视觉 | 720个由视听情感表达的样本组成,8名参与者 | 愤怒、厌恶、恐惧、幸福、悲伤、惊讶 |
IEMOCAP[ | 2008 | 语音、视觉、姿势、文本 | 10 039段对话:平均持续时间为4.5 s,平均单词数为11.4;10名演员 | 中性、快乐、悲伤、愤怒、惊讶、恐惧、厌恶、沮丧、兴奋等;维度标签:效价、唤醒和支配 |
AFEW[ | 2012 | 语音、视觉 | 1 426个视频片段组成 | 愤怒、厌恶、恐惧、幸福、悲伤、惊讶、中性 |
BAUM-1s[ | 2016 | 语音、视觉 | 1 222个视频样本,31名土耳其受试者 | 快乐、愤怒、悲伤、厌恶、恐惧、惊讶 |
CHEAVD[ | 2016 | 语音、视觉 | 来自电影、电视剧、电视节目的140 min的自发情感片段,238名说话者 | 有26种非原型的情感状态,前8个主要情感为愤怒、快乐、悲伤、担心、焦虑、惊讶、厌恶、中性 |
CMU-MOSI[ | 2016 | 语音、视觉、文本 | 2 199个评论的话语、93段说话者视频 | 消极、积极 |
RAMAS[ | 2018 | 语音、视觉、姿势、生理信号 | 大约7 h的高质量特写视频记录,10位演员 | 愤怒、厌恶、快乐、悲伤、恐惧、惊讶 |
RAVDESS[ | 2018 | 语音、视觉 | 60段演讲,44首歌曲,24位演员 | 中性、平静、快乐、悲伤、愤怒、恐惧、厌恶、惊讶 |
CMU-MOSEI[ | 2018 | 语音、视觉、文本 | 来自1 000多名在线YouTube演讲者的3 837段视频 | 快乐、悲伤、愤怒、恐惧、厌恶、惊讶 |
MELD[ | 2019 | 语音、视觉、文本 | 包含了电视剧 | 愤怒、厌恶、恐惧、喜悦、中立、悲伤、惊讶;正面、负面和中性 |
CH-SIMS[ | 2020 | 语音、视觉、文本 | 2 281个野外视频片段 | 消极、弱消极、中性、弱积极、积极 |
HEU-part1[ | 2021 | 视觉、姿势 | 总共19 004个视频片段,根据数据源分为两部分,共有9 951名受试者 | 愤怒、无聊、困惑、失望、厌恶、恐惧、快乐、中立、悲伤、惊讶 |
HEU-part2[ | 2021 | 语音、视觉、姿势 |
时间 | 作者 | 模态 | 特征提取 | 融合方式 | 分类/回归 | 数据集 | 识别结果 |
---|---|---|---|---|---|---|---|
2020 | Huang等[ | 语音、视觉 | 语音:eGeMAPS 视觉:几何特征 | 模型层融合 (Transformer+LSTM) | 全连接层 | AVEC 2017 | CCC(Arousal维度):0.654 CCC(Valence维度):0.708 |
2020 | 刘菁菁等[ | 语音、视觉 | 语音:MFCC、Fbank等 视觉:人脸特征点间的距离长度 | 特征层融合 决策层融合 模型层融合 (双层LSTM) | Softmax | eNTERFACE’05 | Acc(6-class):74.40% |
2021 | Liu等[ | 语音、视觉 | 语音:声谱图+2D-CNN 视觉:VGG16 | 模型层融合 (GapsGCN) | 全连接层 | eNTERFACE’05 | Acc(6-class):80.83% F1-score:80.23% |
2021 | 王传昱等[ | 语音、视觉 | 语音:DBM+LSTM 视觉:LBPH+SAE+CNN | 决策层融合 | Softmax | CHEAVD | Acc(6-class):74.90% |
2018 | Hazarika等[ | 语音、文本 | 语音: MFCC等 文本:FastText+CNN | 特征层融合 (Self-Attention) | Softmax | IEMOCAP | Acc(4-class):71.40% F1-score:71.30% |
Table 2 Multimodal information fusion methods
时间 | 作者 | 模态 | 特征提取 | 融合方式 | 分类/回归 | 数据集 | 识别结果 |
---|---|---|---|---|---|---|---|
2020 | Huang等[ | 语音、视觉 | 语音:eGeMAPS 视觉:几何特征 | 模型层融合 (Transformer+LSTM) | 全连接层 | AVEC 2017 | CCC(Arousal维度):0.654 CCC(Valence维度):0.708 |
2020 | 刘菁菁等[ | 语音、视觉 | 语音:MFCC、Fbank等 视觉:人脸特征点间的距离长度 | 特征层融合 决策层融合 模型层融合 (双层LSTM) | Softmax | eNTERFACE’05 | Acc(6-class):74.40% |
2021 | Liu等[ | 语音、视觉 | 语音:声谱图+2D-CNN 视觉:VGG16 | 模型层融合 (GapsGCN) | 全连接层 | eNTERFACE’05 | Acc(6-class):80.83% F1-score:80.23% |
2021 | 王传昱等[ | 语音、视觉 | 语音:DBM+LSTM 视觉:LBPH+SAE+CNN | 决策层融合 | Softmax | CHEAVD | Acc(6-class):74.90% |
2018 | Hazarika等[ | 语音、文本 | 语音: MFCC等 文本:FastText+CNN | 特征层融合 (Self-Attention) | Softmax | IEMOCAP | Acc(4-class):71.40% F1-score:71.30% |
[1] | DINO H I, ABDULRAZZAQ M B. Facial expression classifica-tion based on SVM, KNN and MLP classifiers[C]// Procee-dings of the 2019 International Conference on Advanced Science and Engineering, Duhok, Apr 2-4, 2019. Piscata-way: IEEE, 2019: 70-75. |
[2] |
PERVEEN N, ROY D, CHALAVADI K M. Facial expres-sion recognition in videos using dynamic kernels[J]. IEEE Transactions on Image Processing, 2020, 29: 8316-8325.
DOI URL |
[3] | SHRIVASTAVA V, RICHHARIYA V, RICHHARIYA V. Puzz-ling out emotions: a deep-learning approach to multimodal sentiment analysis[C]// Proceedings of the 2018 International Conference on Advanced Computation and Telecommunica-tion, Bhopal, Dec 28-29, 2018. Piscataway: IEEE, 2018: 1-6. |
[4] | SCHERER K R. Psychological models of emotion[J]. The Neu-ropsychology of Emotion, 2000, 137(3): 137-162. |
[5] | AMMEN S, ALFARRAS M, HADI W. OFDM system per-formance enhancement using discrete wavelet transform and DSSS system over mobile channel[R]. Advances in Com-puter Science and Engineering, 2010: 142-147. |
[6] | LIANG J J, CHEN S Z, JIN Q. Semi-supervised multimodal emotion recognition with improved Wasserstein GANs[C]// Proceedings of the 2019 Asia-Pacific Signal and Informa-tion Processing Association Annual Summit and Conference, Lanzhou, Nov 18-21, 2019. Piscataway: IEEE, 2019: 695-703. |
[7] | AL-SULTAN M R, AMEEN S Y, ABDUALLAH W M. Real time implementation of stegofirewall system[J]. Interna-tional Journal of Computing Digital Systems, 2019, 8(5): 498-504. |
[8] | ZHANG Y Y, WANG Z R, DU J. Deep fusion: an attention guided factorized bilinear pooling for audio-video emotion recognition[C]// Proceedings of the 2019 International Joint Conference on Neural Networks, Budapest, Jul 14-19, 2019. Piscataway: IEEE, 2019: 1-8. |
[9] | CHEN J, LV Y, XU R, et al. Automatic social signal analysis: facial expression recognition using difference convolution neural network[J]. Journal of Parallel Distributed Compu-ting, 2019, 131: 97-102. |
[10] | GHALEB E, POPA M, ASTERIADIS S. Multimodal and temporal perception of audio-visual cues for emotion recogni-tion[C]// Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction, Cam-bridge, Sep 3-6, 2019. Piscataway: IEEE, 2019: 552-558. |
[11] | ABDULRAZZAQ M B, KHALAF K I. Handwritten nume-rals’ recognition in Kurdish language using double feature selection[C]// Proceedings of the 2019 2nd International Confe-rence on Engineering Technology and Its Applications, Al-Najef, Aug 27-28, 2019. Piscataway: IEEE, 2019: 167-172. |
[12] | CAIHUA C. Research on multi-modal Mandarin speech emo-tion recognition based on SVM[C]// Proceedings of the 2019 IEEE International Conference on Power, Intelligent Computing and Systems, Shenyang, Jul 12-14, 2019. Pisca-taway: IEEE, 2019: 173-176. |
[13] | SCHULLER B W, VALSTER M F, EYBEN F, et al. AVEC 2012: the continuous audio/visual emotion challenge[C]// Proceedings of the 2012 International Conference on Multi-modal Interaction, Santa Monica, Oct 22-26, 2012. New York: ACM, 2012: 449-456. |
[14] | DHALL A, GOECKE R, JOSHI J, et al. Emotion recogni-tion in the wild challenge 2013[C]// Proceedings of the 2013 International Conference on Multimodal Interaction, Sy-dney, Dec 9-13, 2013. New York: ACM, 2013: 509-516. |
[15] | STAPPEN L, BAIRD A, RIZOS G, et al. MuSe 2020 Chal-lenge and Workshop: multimodal sentiment analysis, emo-tion target engagement and trustworthiness detection in real-life media: emotional car reviews in-the-wild[C]// Procee-dings of the 1st International on Multimodal Sentiment Ana-lysis in Real-Life Media Challenge and Workshop, Seattle, Oct 16, 2020. New York: ACM, 2020: 35-44. |
[16] | STAPPEN L, MEßNER E M, CAMBRIA E, et al. MuSe 2021 challenge: multimodal emotion, sentiment, physiolo-gical-emotion, and stress detection[C]// Proceedings of the 2021 ACM International Conference on Multimedia, Oct 20-24, 2021. New York: ACM, 2021: 5706-5707. |
[17] | LI Y, TAO J H, SCHULLER B W, et al. MEC 2016: the multimodal emotion recognition challenge of CCPR 2016[C]// Proceedings of the 7th Chinese Conference on Pattern Recognition, Chengdu, Nov 5-7, 2016. Cham: Springer, 2016: 667-678. |
[18] | OBAID K B, ZEEBAREE S, AHMED O M. Deep learning models based on image classification: a review[J]. Interna-tional Journal of Science Business, 2020, 4(11): 75-81. |
[19] |
ZHAO X, SHI X, ZHANG S. Facial expression recognition via deep learning[J]. IETE Technical Review, 2015, 32(5): 347-355.
DOI URL |
[20] |
SCHMIDHUBER J. Deep learning in neural networks: an over-view[J]. Neural Networks, 2015, 61: 85-117.
DOI URL |
[21] |
HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554.
DOI URL |
[22] |
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
DOI URL |
[23] |
ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211.
DOI URL |
[24] | D’MELLO S K, KORY J. A review and meta-analysis of multi-modal affect detection systems[J]. ACM Computing Surveys, 2015, 47(3): 1-36. |
[25] | RISH I. An empirical study of the naive Bayes classifier[C]// Proceedings of the 2001 Workshop on Empirical Me-thods in A.pngicial Intelligence, Seattle, 2001: 41-46. |
[26] |
KEERTHI S S, SHEVADE S K, BHATTACHARYYA C, et al. Improvements to Platt’s SMO algorithm for SVM clas-sifier design[J]. Neural Computation, 2001, 13(3): 637-649.
DOI URL |
[27] |
WINDEATT T. Accuracy/diversity and ensemble MLP classi-fier design[J]. IEEE Transactions on Neural Networks, 2006, 17(5): 1194-1211.
DOI URL |
[28] | MARTIN O, KOTSIA I, MACQ B, et al. The eNTERFACE’05 audio-visual emotion database[C]// Proceedings of the 22nd International Conference on Data Engineering Workshops, Atlanta, Apr 3-7, 2006. Washington: IEEE Computer Society, 2006: 8. |
[29] | WANG Y, GUAN L. Recognizing human emotional state from audiovisual signals[J]. IEEE Transactions on Multime-dia, 2008, 10(5): 936-946. |
[30] |
BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: interac-tive emotional dyadic motion capture database[J]. Language Resources and Evaluation, 2008, 42(4): 335-359.
DOI URL |
[31] |
DHALL A, GOECKE R, LUCEY S, et al. Collecting large, richly annotated facial-expression databases from movies[J]. IEEE Multimedia, 2012, 19(3): 34-41.
DOI URL |
[32] | ZHALEHPOUR S, ONDER O, AKHTAR Z, et al. BAUM-1: a spontaneous audio-visual face database of affective and mental states[J]. IEEE Transactions on Affective Com-puting, 2016, 8(3): 300-313. |
[33] |
ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal senti-ment intensity analysis in videos: facial gestures and ver-bal messages[J]. IEEE Intelligent Systems, 2016, 31(6): 82-88.
DOI URL |
[34] | PEREPELKINA O, KAZIMIROVA E, KONSTANTINOVA M. RAMAS. Russian multimodal corpus of dyadic intera-ction for affective computing[C]// LNCS 11096: Proceedings of the 20th International Conference on Speech and Com-puter, Leipzig, Sep 18-22, 2018. Cham: Springer, 2018: 501-510. |
[35] |
LIVINGSTONE S R, RUSSO F A. The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English[J]. PLoS One, 2018, 13(5): e0196391.
DOI URL |
[36] | ZADEH A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 2236-2246. |
[37] | PORIA S, HAZARIKA D, MAJUMDER N, et al. MELD: a multimodal multi-party dataset for emotion recognition in conversations[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 527-536. |
[38] | YU W, XU H, MENG F, et al. CH-SIMS: a Chinese multi-modal sentiment analysis dataset with fine-grained annota-tion of modality[C]// Proceedings of the 58th Annual Mee-ting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3718-3727. |
[39] |
CHEN J, WANG C H, WANG K J, et al. HEU emotion: a large-scale database for multimodal emotion recognition in the wild[J]. Neural Computing and Applications, 2021, 33(14): 8669-8685.
DOI URL |
[40] |
DENG L, YU D. Deep learning: methods and applications[J]. Foundations and Trends in Signal Processing, 2014, 7(3/4): 197-387.
DOI URL |
[41] | FREUND Y, HAUSSLER D. Unsupervised learning of dis-tributions of binary vectors using 2-layer networks[C]// Ad-vances in Neural Information Processing Systems 4, Denver, Dec 2-5, 1991. San Mateo: Morgan Kaufmann, 1991: 912-919. |
[42] | BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]// Proceedings of the 20th Annual Conference on Neural Information Processing Systems, Vancouver, Dec 4-7, 2006. Cambridge: MIT Press, 2007: 153-160. |
[43] |
HINTON G E. Training products of experts by minimizing contrastive divergence[J]. Neural Computation, 2002, 14(8): 1771-1800.
DOI URL |
[44] | LEE H, GROSSE R B, RANGANATH R, et al. Convolu-tional deep belief networks for scalable unsupervised lear-ning of hierarchical representations[C]// Proceedings of the 26th International Conference on Machine Learning, Mon-treal, Jun 14-18, 2009. New York: ACM, 2009: 609-616. |
[45] | WANG G M, QIAO J F, BI J, et al. TL-GDBN: growing deep belief network with transfer learning[J]. IEEE Transac-tions on Automation Science and Engineering, 2018, 16(2): 874-885. |
[46] | DENG W, LIU H L, XU J J, et al. An improved quantum-inspired differential evolution algorithm for deep belief net-work[J]. IEEE Transactions on Instrumentation and Mea-surement, 2020, 69(10): 7319-7327. |
[47] |
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
DOI URL |
[48] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Image-Net classification with deep convolutional neural networks[C]// Proceedings of the 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, Dec 3-6, 2012. Red Hook: Curran Associates, 2012: 1106-1114. |
[49] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409. 1556, 2014. |
[50] | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conferen-ce on Computer Vision and Pattern Recognition, Boston, Jun 7-12, 2015. Washington: IEEE Computer Society, 2015: 1-9. |
[51] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual lear-ning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, Jun 27-30, 2016. Washington: IEEE Com-puter Society, 2016: 770-778. |
[52] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4700-4708. |
[53] | TRAN D, BOURDEV L D, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]// Proceedings of the 2015 IEEE International Confe-rence on Computer Vision, Santiago, Dec 7-13, 2015. Was-hington: IEEE Computer Society, 2015: 4489-4497. |
[54] |
YANG H, YUAN C F, LI B, et al. Asymmetric 3D convolu-tional neural networks for action recognition[J]. Pattern Recognition, 2019, 85: 1-12.
DOI URL |
[55] | KUMAWAT S, RAMAN S. LP-3DCNN: unveiling local phase in 3D convolutional neural networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 4903-4912. |
[56] | CHEN H, WANG Y, SHU H, et al. Frequency domain com-pact 3D convolutional neural networks[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 1638-1647. |
[57] |
WERBOS P J. Backpropagation through time: what it does and how to do it[J]. Proceedings of the IEEE, 1990, 78(10): 1550-1560.
DOI URL |
[58] |
HOCHREITER S, SCHMIDHUBER J. Long short-term me-mory[J]. Neural Computation, 1997, 9(8): 1735-1780.
DOI URL |
[59] | CHUNG J, GÜLÇEHRE Ç, CHO K, et al. Empirical eva-luation of gated recurrent neural networks on sequence mo-deling[J]. arXiv:1412. 3555, 2014. |
[60] | ZHAO R, WANG K, SU H, et al. Bayesian graph convo-lution LSTM for skeleton based action recognition[C]// Pro-ceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6881-6891. |
[61] |
ZHANG S, ZHAO X, TIAN Q. Spontaneous speech emo-tion recognition using multiscale deep convolutional LSTM[J]. IEEE Transactions on Affective Computing, 2019. DOI: 10.1109/TAFFC.2019.2947464.
DOI |
[62] | XING Y, DI CATERINA G, SORAGHAN J. A new spiking convolutional recurrent neural network (SCRNN) with app-lications to event-based hand gesture recognition[J]. Fron-tiers in Neuroscience, 2020, 14: 1143. |
[63] | 高庆吉, 赵志华, 徐达, 等. 语音情感识别研究综述[J]. 智能系统学报, 2020, 15(1): 1-13. |
GAO Q J, ZHAO Z H, XU D, et al. Review on speech emotion recognition research[J]. CAAI Transactions on Intelligent Systems, 2020, 15(1): 1-13. | |
[64] | 刘振焘, 徐建平, 吴敏, 等. 语音情感特征提取及其降维方法综述[J]. 计算机学报, 2018, 41(12): 2833-2851. |
LIU Z T, XU J P, WU M, et al. Review of emotional feature extraction and dimension reduction method for speech emo-tion recognition[J]. Chinese Journal of Computers, 2018, 41(12): 2833-2851. | |
[65] | 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37-50. |
HAN W J, LI H F, RUAN H B, et al. Review on speech emo-tion recognition[J]. Journal of Software, 2014, 25(1): 37-50. | |
[66] | 郑纯军, 王春立, 贾宁. 语音任务下声学特征提取综述[J]. 计算机科学, 2020, 47(5): 110-119. |
ZHENG C J, WANG C L, JIA N. Survey of acoustic fea-ture extraction in speech tasks[J]. Computing Science, 2020, 47(5): 110-119. | |
[67] | LISCOMBE J, VENDITTI J, HIRSCHBERG J B. Classi-fying subject ratings of emotional speech using acoustic features[C]// Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Sep 1-4, 2003. |
[68] | YACOUB S M, SIMSKE S J, LIN X F, et al. Recognition of emotions in interactive voice response systems[C]// Procee-dings of the 8th European Conference on Speech Communi-cation and Technology, Geneva, Sep 1-4, 2003. |
[69] | SCHMITT M, RINGEVAL F, SCHULLER B W. At the bor-der of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech[C]// Proceedings of the 17th Annual Conference of the International Speech Commu-nication Association, San Francisco, Sep 8-12, 2016: 495-499. |
[70] | 孙韩玉, 黄丽霞, 张雪英, 等. 基于双通道卷积门控循环网络的语音情感识别[J/OL]. 计算机工程与应用(2021-10-18)[2022-02-28]. https://kns.cnki.net/kcms/detail/11.2127.TP.20211015.2021.002.html. |
SUN H Y, HUANG L X, ZHANG X Y, et al. Speech emo-tion recognition based on dual-channel convolutional gated recurrent network[J/OL]. Computer Engineering and App-lications(2021-10-18)[2022-02-28]. https://kns.cnki.net/kcms/detail/11.2127.TP.20211015.2021.002.htm. | |
[71] |
LUENGO I, NAVAS E, HERNÁEZ I. Feature analysis and evaluation for automatic emotion ide.pngication in speech[J]. IEEE Transactions on Multimedia, 2010, 12(6): 490-501.
DOI URL |
[72] | DUTTA K, SARMA K K. Multiple feature extraction for RNN-based assamese speech recognition for speech to text conversion application[C]// Proceedings of the 2012 Inter-national Conference on Communications, Devices and Intelligent Systems, Kolkata, Dec 28-29, 2012. Piscataway: IEEE, 2013: 1-6. |
[73] | MAO Q, DONG M, HUANG Z, et al. Learning salient features for speech emotion recognition using convolu-tional neural networks[J]. IEEE Transactions on Multime-dia, 2014, 16(8): 2203-2213. |
[74] | 陈婧, 李海峰, 马琳, 等. 多粒度特征融合的维度语音情感识别方法[J]. 信号处理, 2017, 33(3): 374-382. |
CHEN J, LI H F, MA L, et al. Multi-granularity feature fusion for dimensional speech emotion recognition[J]. Jour-nal of Signal Processing, 2017, 33(3): 374-382. | |
[75] | 俞佳佳, 金赟, 马勇, 等. 基于Sinc-Transformer模型的原始语音情感识别[J]. 信号处理, 2021, 37(10): 1880-1888. |
YU J J, JIN Y, MA Y, et al. Emotion recognition from raw speech based on Sinc-Transformer model[J]. Journal of Signal Processing, 2021, 37(10): 1880-1888. | |
[76] | ZHANG S Q, ZHAO X M, CHUANG Y L, et al. Feature learning via deep belief network for Chinese speech emo-tion recognition[C]// Proceedings of the 7th Chinese Confe-rence on Pattern Recognition, Chengdu, Nov 5-7, 2016. Cham: Springer, 2016: 645-651. |
[77] | OTTL S, AMIRIPARIAN S, GERCZUK M, et al. Group-level speech emotion recognition utilising deep spectrum fea-tures[C]// Proceedings of the 2020 International Conference on Multimodal Interaction, Oct 25-29, 2020. New York: ACM, 2020: 821-826. |
[78] | EYBEN F, WÖLLMER M, SCHULLER B W. OpenSMILE: the munich versatile and fast open-source audio feature extractor[C]// Proceedings of the 18th International Confe-rence on Multimedia 2010, Firenze, Oct 25-29, 2010. New York: ACM, 2010: 1459-1462. |
[79] | 蒋斌, 钟瑞, 张秋闻, 等. 采用深度学习方法的非正面表情识别综述[J]. 计算机工程与应用, 2021, 57(8): 48-61. |
JIANG B, ZHONG R, ZHANG Q W, et al. Survey of non-frontal facial expression recognition by using deep learning methods[J]. Computer Engineering and Applications, 2021, 57(8): 48-61. | |
[80] | 李珊, 邓伟洪. 深度人脸表情识别研究进展[J]. 中国图象图形学报, 2020, 25(11): 2306-2320. |
LI S, DENG W H. Deep facial expression recognition: a survey[J]. Journal of Image and Graphics, 2020, 25(11): 2306-2320. | |
[81] |
MELLOUK W, HANDOUZI W. Facial emotion recogni-tion using deep learning: review and insights[J]. Procedia Computer Science, 2020, 175: 689-694.
DOI URL |
[82] | ZHAO X, ZHANG S. A review on facial expression recog-nition: feature extraction and classification[J]. IETE Techni-cal Review, 2016, 33(5): 505-517. |
[83] |
CHEN J, LIU X, TU P, et al. Learning person-specific mo-dels for facial expression and action unit recognition[J]. Pattern Recognition Letters, 2013, 34(15): 1964-1970.
DOI URL |
[84] | ZHANG S, ZHAO X, LEI B. Facial expression recognition based on local binary patterns and local Fisher discriminant analysis[J]. WSEAS Transactions on Signal Processing, 2012, 8(1): 21-31. |
[85] |
CHU W S, DE LA TORRE F, COHN J F. Selective transfer machine for personalized facial expression analysis[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2016, 39(3): 529-545.
DOI URL |
[86] | BALTRUŠAITIS T, MAHMOUD M, ROBINSON P. Cross-dataset learning and person-specific normalisation for auto-matic action unit detection[C]// Proceedings of the 11th IEEE International Conference and Workshops on Auto-matic Face and Gesture Recognition, Ljubljana, May 4-8, 2015. Washington: IEEE Computer Society, 2015: 1-6. |
[87] |
AHSAN T, JABID T, CHONG U P. Facial expression re-cognition using local transitional pattern on Gabor filtered facial images[J]. IETE Technical Review, 2013, 30(1): 47-52.
DOI URL |
[88] | 刘军, 景晓军, 孙松林, 等. 一种用于人脸识别的基于主导近邻像素的局部Gabor空间直方图特征[J]. 北京邮电大学学报, 2015, 38(1): 51-54. |
LIU J, JING X J, SUN S L, et al. Feature of local gabor spatial histogram based on dominant neighboring pixel for face recognition[J]. Journal of Beijing University of Posts and Telecommunications, 2015, 38(1): 51-54.
DOI |
|
[89] |
BAH S M, MING F. An improved face recognition algori-thm and its application in attendance management system[J]. Array, 2020, 5: 100014.
DOI URL |
[90] | DEEBA F, AHMED A, MEMON H, et al. LBPH-based enhanced real-time face recognition[J]. International Journal of Advanced Computer Science and Applications, 2019, 10(5): 274-280. |
[91] |
ZHANG T, ZHENG W, CUI Z, et al. A deep neural network-driven feature learning method for multiview facial expres-sion recognition[J]. IEEE Transactions on Multimedia, 2016, 18(12): 2528-2536.
DOI URL |
[92] | YEASIN M, BULLOT B, SHARMA R. From facial expres-sion to level of interest: a spatio-temporal approach[C]// Pro-ceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, Jun 27-Jul 2, 2004. Washington: IEEE Computer Society, 2004: 922-927. |
[93] | FAN X, TJAHJADI T. A spatial-temporal framework based on histogram of gradients and optical flow for facial ex-pression recognition in video sequences[J]. Pattern Recog-nition, 2015, 48(11): 3407-3416. |
[94] | BOSCH A, ZISSERMAN A, MUÑOZ X. Representing shape with a spatial pyramid kernel[C]// Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, Jul 9-11, 2007. New York: ACM, 2007: 401-408. |
[95] | 刘涛, 周先春, 严锡君. 基于光流特征与高斯LDA的面部表情识别算法[J]. 计算机科学, 2018, 45(10): 286-290. |
LIU T, ZHOU X C, YAN X J. LDA facial expression re-cognition algorithm combining optical flow characteris-tics with Gaussian[J]. Computing Science, 2018, 45(10): 286-290. | |
[96] | HAPPY S, ROUTRAY A. Fuzzy histogram of optical flow orientations for micro-expression recognition[J]. IEEE Tran-sactions on Affective Computing, 2017, 10(3): 394-406. |
[97] | 邵洁, 董楠. RGB-D动态序列的人脸自然表情识别[J]. 计算机辅助设计与图形学学报, 2015, 27(5): 847-854. |
SHAO J, DONG N. Spontaneous facial expression recogni-tion based on RGB-D dynamic sequences[J]. Journal of Computer-Aided Design & Computer Graphics, 2015, 27(5): 847-854. | |
[98] |
YI J, CHEN A, CAI Z, et al. Facial expression recognition of intercepted video sequences based on feature point move-ment trend and feature block texture variation[J]. Applied Soft Computing, 2019, 82: 105540.
DOI URL |
[99] |
YOLCU G, OZTEL I, KAZAN S, et al. Facial expression recognition for monitoring neurological disorders based on convolutional neural network[J]. Multimedia Tools and Applications, 2019, 78(22): 31581-31603.
DOI URL |
[100] |
SUN N, LI Q, HUAN R, et al. Deep spatial-temporal feature fusion for facial expression recognition in static images[J]. Pattern Recognition Letters, 2019, 119: 49-61.
DOI URL |
[101] | 张鹏, 孔韦韦, 滕金保. 基于多尺度特征注意力机制的人脸表情识别[J]. 计算机工程与应用, 2022, 58(1): 182-189. |
ZHANG P, KONG W W, TENG J B. Facial expression recognition based on multi-scale feature attention mecha-nism[J]. Computer Engineering and Applications, 2022, 58(1): 182-189. | |
[102] | SEPAS-MOGHADDAM A, ETEMAD S A, PEREIRA F, et al. Facial emotion recognition using light field images with deep attention-based bidirectional LSTM[C]// Pro-ceedings of the IEEE 2020 International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 3367-3371. |
[103] | 崔子越, 皮家甜, 陈勇, 等. 结合改进VGGNet和Focal Loss的人脸表情识别[J]. 计算机工程与应用, 2021, 57(19): 171-178. |
CUI Z Y, PI J T, CHEN Y, et al. Facial expression recog-nition combined with improved VGGNet and Focal Loss[J]. Computer Engineering and Applications, 2021, 57(19): 171-178. | |
[104] | 郑剑, 郑炽, 刘豪, 等. 融合局部特征与两阶段注意力权重学习的面部表情识别[J]. 计算机应用研究, 2022, 39(3): 889-894. |
ZHENG J, ZHENG C, LIU H, et al. Deep convolutional neural network fusing local feature and two-stage atten-tion weight learning for facial expression recognition[J]. Application Research of Computers, 2022, 39(3): 889-894. | |
[105] | JUNG H, LEE S, YIM J, et al. Joint fine-tuning in deep neural networks for facial expression recognition[C]// Pro-ceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Washington: IEEE Computer Society, 2015: 2983-2991. |
[106] | JAISWAL S, VALSTAR M F. Deep learning the dynamic appearance and shape of facial action units[C]// Procee-dings of the 2016 IEEE Winter Conference on Applica-tions of Computer Vision, Lake Placid, Mar 7-10, 2016. Washington: IEEE Computer Society, 2016: 1-8. |
[107] | FAN Y, LU X J, LI D, et al. Video-based emotion recog-nition using CNN-RNN and C3D hybrid networks[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Nov 12-16, 2016. New York: ACM, 2016: 445-450. |
[108] |
KIM D H, BADDAR W J, JANG J, et al. Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression re-cognition[J]. IEEE Transactions on Affective Computing, 2017, 10(2): 223-236.
DOI URL |
[109] |
YU Z, LIU G, LIU Q, et al. Spatio-temporal convolutional features with nested LSTM for facial expression recogni-tion[J]. Neurocomputing, 2018, 317: 50-57.
DOI URL |
[110] |
LIANG D, LIANG H, YU Z, et al. Deep convolutional BiLSTM fusion network for facial expression recognition[J]. The Visual Computer, 2020, 36(3): 499-508.
DOI URL |
[111] | 司马懿, 易积政, 陈爱斌, 等. 动态人脸图像序列中表情完全帧的定位与识别[J]. 应用科学学报, 2021, 39(3): 357-366. |
SIMA Y, YI J Z, CHEN A B, et al. Fully expression frame localization and recognition based on dynamic face image sequences[J]. Journal of Applied Sciences, 2021, 39(3): 357-366. | |
[112] | MENG D B, PENG X J, WANG K, et al. Frame attention networks for facial expression recognition in videos[C]// Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei, China, Sep 22-25, 2019. Pis-cataway: IEEE, 2019: 3866-3870. |
[113] |
PAN X, ZHANG S, GUO W, et al. Video-based facial ex-pression recognition using deep temporal-spatial networks[J]. IETE Technical Review, 2020, 37(4): 402-409.
DOI URL |
[114] | SOUMYA G K, JOSEPH S. Text classification by aug-menting bag of words (BOW) representation with co-occurrence feature[J]. IOSR Journal of Computer Engi-neering, 2014, 16(1): 34-38. |
[115] | ZHAO R, MAO K. Fuzzy bag-of-words model for docu-ment representation[J]. IEEE Transactions on Fuzzy Sys-tems, 2017, 26(2): 794-804. |
[116] |
TRSTENJAK B, MIKAC S, DONKO D. KNN with TF-IDF based framework for text categorization[J]. Procedia Engineering, 2014, 69: 1356-1364.
DOI URL |
[117] | KIM D, SEO D, CHO S, et al. Multi-co-training for docu-ment classification using various document represen-tations: TFIDF, LDA, and Doc2Vec[J]. Information Scien-ces, 2019, 477: 15-29. |
[118] | DEERWESTER S C, DUMAIS S T, LANDAUER T K, et al. Indexing by latent semantic analysis[J]. Journal of the Association for Information Science & Technology, 1990, 41(6): 391-407. |
[119] | HOFMANN T. Probabilistic latent semantic analysis[C]// Proceedings of the 15th Conference on Uncertainty in A.pngicial Intelligence, Stockholm, Jul 30-Aug 1, 1999. San Mateo: Morgan Kaufmann, 1999: 289-296. |
[120] | BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. |
[121] | DENG J W, REN F J. A survey of textual emotion recogni-tion and its challenges[J]. IEEE Transactions on Affective Computing, 2021: 1. |
[122] | MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distribu-ted representations of words and phrases and their com-positionality[C]// Advances in Neural Information Proces-sing Systems 26, Lake Tahoe, Dec 5-8, 2013. Red Hook: Curran Associates, 2013: 3111-3119. |
[123] | PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1532-1543. |
[124] | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, New Orleans, Jun 1-6, 2018. St-roudsburg: ACL, 2018: 2227-2237. |
[125] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Pro-cessing Systems 30, Dec 4-9, 2017. Red Hook: Curran As-sociates, 2017: 5998-6008. |
[126] | CHUNG Y A, GLASS J R. Generative pre-training for speech with autoregressive predictive coding[C]// Procee-dings of the 2020 IEEE International Conference on Acou-stics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 3497-3501. |
[127] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Com-putational Linguistics:Human Language Technologies, Min-neapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186. |
[128] | RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9. |
[129] | BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]// Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 1877-1901. |
[130] | DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Conference of the Associa-tion for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 2978-2988. |
[131] | YANG Z L, DAI Z H, YANG Y M, et al. XLNet: genera-lized autoregressive pretraining for language understan-ding[C]// Advances in Neural Information Processing Sys-tems 32, Vancouver, Dec 8-14, 2019: 5754-5764. |
[132] | TANG D, WEI F, YANG N, et al. Learning sentiment-specific word embedding for twitter sentiment classifica-tion[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2014: 1555-1565. |
[133] | FELBO B, MISLOVE A, SØGAARD A, et al. Using mil-lions of emoji occurrences to learn any-domain represen-tations for detecting sentiment, emotion and sarcasm[C]// Proceedings of the 2017 Conference on Empirical Me-thods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Stroudsburg: ACL, 2017: 1615-1625. |
[134] | XU P, MADOTTO A, WU C S, et al. Emo2Vec: learning generalized emotion representation by multi-task training[C]// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Oct 31, 2018. Stroudsburg: ACL, 2018: 292-298. |
[135] | SHI B, FU Z, BING L, et al. Learning domain-sensitive and sentiment-aware word embeddings[C]// Proceedings of the 56th Annual Meeting of the Association for Compu-tational Linguistics, Melbourne, Jul 15-20, 2018. Strouds-burg: ACL, 2018: 2494-2504. |
[136] |
ABDULLAH S M S A, AMEEN S Y A, SADEEQ M A, et al. Multimodal emotion recognition using deep learning[J]. Journal of Applied Science and Technology Trends, 2021, 2(2): 52-58.
DOI URL |
[137] |
SHOUMY N J, ANG L M, SENG K P, et al. Multimodal big data affective analytics: a comprehensive survey using text, audio, visual and physiological signals[J]. Journal of Network and Computer Applications, 2020, 149: 102447.
DOI URL |
[138] | SUN Z, SONG Q, ZHU X, et al. A novel ensemble me-thod for classifying imbalanced data[J]. Pattern Recogni-tion, 2015, 48(5): 1623-1637. |
[139] | HUANG J, TAO J H, LIU B, et al. Multimodal transfo-rmer fusion for continuous emotion recognition[C]// Procee-dings of the IEEE 2020 International Conference on Acou-stics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 3507-3511. |
[140] | RINGEVAL F, SCHULLER B W, VALSTAR M F, et al. AVEC 2017:reallife depression, and affect recognition workshop and challenge[C]// Proceedings of the 7th Ann-ual Workshop on Audio/Visual Emotion Challenge, Moun-tain View, Oct 23-27, 2017. New York: ACM, 2017: 3-9. |
[141] | 刘菁菁, 吴晓峰. 基于长短时记忆网络的多模态情感识别和空间标注[J]. 复旦学报(自然科学版), 2020, 59(5): 565-574. |
LIU J J, WU X F. Real-time multimodal emotion recogni-tion and emotion space labeling using LSTM networks[J]. Journal of Fudan University (Natural Science), 2020, 59(5): 565-574. | |
[142] | LIU J X, CHEN S, WANG L B, et al. Multimodal emotion recognition with capsule graph convolutional based repre-sentation fusion[C]// Proceedings of the 2021 IEEE Inter-national Conference on Acoustics, Speech and Signal Pro-cessing, Toronto, Jun 6-11, 2021. Piscataway: IEEE, 2021: 6339-6343. |
[143] | 王传昱, 李为相, 陈震环. 基于语音和视频图像的多模态情感识别研究[J]. 计算机工程与应用, 2021, 57(23): 163-170. |
WANG C Y, LI W X, CHEN Z H. Reserch of multi-modal emotion recognition based on voice and video images[J]. Computer Engineering and Applications, 2021, 57(23): 163-170. | |
[144] | HAZARIKA D, GORANTLA S, PORIA S, et al. Self-attentive feature-level fusion for multimodal emotion detec-tion[C]// Proceedings of the IEEE 1st Conference on Multi-media Information Processing and Retrieval, Miami, Apr 10-12, 2018. Piscataway: IEEE, 2018: 196-201. |
[145] |
BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146.
DOI URL |
[146] | PRIYASAD D, FERNANDO T, DENMAN S, et al. Attention driven fusion for multi-modal emotion recog-nition[C]// Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 3227-3231. |
[147] | KRISHNA D N, PATIL A. Multimodal emotion recogni-tion using cross-modal attention and 1D convolutional neural networks[C]// Proceedings of the 21st Annual Con-ference of the International Speech Communication Asso-ciation, Shanghai, Oct 25-29, 2020: 4243-4247. |
[148] | LIAN Z, LIU B, TAO J H. CTNet: conversational transfor-mer network for emotion recognition[J]. IEEE/ACM Tran-sactions on Audio, Speech, Language Processing, 2021, 29: 985-1000. |
[149] | 王兰馨, 王卫亚, 程鑫. 结合Bi-LSTM-CNN的语音文本双模态情感识别模型[J]. 计算机工程与应用, 2022, 58(4): 192-197. |
WANG L X, WANG W Y, CHENG X. Bimodal emotion recognition model for speech-text based on Bi-LSTM-CNN[J]. Computer Engineering and Applications, 2022, 58(4): 192-197. | |
[150] | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:13013781, 2013. |
[151] | PORIA S, CAMBRIA E, HAZARIKA D, et al. Multi-level multiple attentions for contextual multimodal sentiment analysis[C]// Proceedings of the 2017 IEEE International Conference on Data Mining, New Orleans, Nov 18-21, 2017. Washington: IEEE Computer Society, 2017: 1033-1038. |
[152] | PAN Z X, LUO Z J, YANG J C, et al. Multi-modal atten-tion for speech emotion recognition[C]// Proceedings of the 21st Annual Conference of the International Speech Com-munication Association, Oct 25-29, 2020: 364-368. |
[153] | MITTAL T, BHATTACHARYA U, CHANDRA R, et al. M3ER:multiplicative multimodal emotion recognition using facial, textual, and speech cues[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Confe-rence, the 10th AAAI Symposium on Educational Advan-ces in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 1359-1367. |
[154] |
SIRIWARDHANA S, KALUARACHCHI T, BILLINGH-URST M, et al. Multimodal emotion recognition with transformer-based self supervised feature fusion[J]. IEEE Access, 2020, 8: 176274-176285.
DOI URL |
[155] | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly op-timized BERT pretraining approach[J]. arXiv:190711692, 2019. |
[156] | ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehen-sion[C]// Proceedings of the 32nd AAAI Conference on A.pngicial Intelligence, the 30th Innovative Applications of A.pngicial Intelligence, and the 8th AAAI Symposium on Educational Advances in A.pngicial Intelligence, New Or-leans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 5642-5649. |
[157] |
MAI S J, HU H F, XU J, et al. Multi-fusion residual me-mory network for multimodal human sentiment compre-hension[J]. IEEE Transactions on Affective Computing, 2022, 13(1): 320-334.
DOI URL |
[158] | MAAS A, DALY R E, PHAM P T, et al. Learning word vectors for sentiment analysis[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Lin-guistics:Human Language Technologies, Portland, Jun 19-24, 2011. Stroudsburg: ACL: 142-150. |
[159] | WANG Z L, WAN Z H, WAN X J. TransModality: an End2End fusion method with transformer for multimodal sentiment analysis[C]// Proceedings of the Web Conference 2020, Taipei, China, Apr 20-24, 2020. New York: ACM, 2020: 2514-2520. |
[160] | DAI W L, CAHYAWIJAYA S, LIU Z H, et al. Multimodal end-to-end sparse model for emotion recognition[C]// Pro-ceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Jun 6-11, 2021. Strouds-burg: ACL, 2021: 5305-5316. |
[161] |
REN M, HUANG X, SHI X, et al. Interactive multimodal attention network for emotion recognition in conversation[J]. IEEE Signal Processing Letters, 2021, 28: 1046-1050.
DOI URL |
[162] | KHARE A, PARTHASARATHY S, SUNDARAM S. Self-supervised learning with cross-modal transformers for emo-tion recognition[C]// Proceedings of the 2021 IEEE Spo-ken Language Technology Workshop, Shenzhen, Jan 19-22, 2021. Piscataway: IEEE, 2021: 381-388. |
[163] | HE Y H, ZHANG X Y, SUN J. Channel pruning for accelerating very deep neural networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vi-sion, Venice, Oct 22-29, 2017. Washington: IEEE Compu-ter Society, 2017: 1389-1397. |
[164] | LI H, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient ConvNets[J]. arXiv:1608. 08710, 2016. |
[165] | ESCALANTE H J, KAYA H, SALAH A A, et al. Mode-ling, recog nizing, and explaining apparent personality from videos[J]. IEEE Transactions on Affective Compu-ting, 2020: 1. |
[166] |
ANGELOV P, SOARES E. Towards explainable deep neural networks (xDNN)[J]. Neural Networks, 2020, 130: 185-194.
DOI URL |
[167] | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2672-2680. |
[168] | MAKHZANI A, SHLENS J, JAITLY N, et al. Adversarial auto encoders[J]. arXiv:1511. 05644, 2015. |
[169] | MAI S J, HU H F, XING S L. Modality to modality tran-slation: an adversarial representation learning and graph fusion network for multimodal fusion[C]// Proceedings of the 34th AAAI Conference on A.pngicial Intelligence, the 32nd Innovative Applications of A.pngicial Intelligence Con-ference, the 10th AAAI Symposium on Educational Ad-vances in A.pngicial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 164-172. |
[170] | 王忠民, 赵玉鹏, 郑镕林, 等. 脑电信号情绪识别研究综述[J]. 计算机科学与探索, 2022, 16(4): 760-774. |
WANG Z M, ZHAO Y P, ZHENG R L, et al. A survey of research on EGG signal emotion recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 760-774. | |
[171] | YANG C J, FAHIER N, LI W C, et al. A convolution neural network based emotion recognition system using multimodal physiological signals[C]// Proceedings of the 2020 IEEE International Conference on Consumer Elec-tronics, Taoyuan, China, Sep 28-30, 2020. Piscataway: IEEE, 2020: 1-2. |
[172] | WU J, ZHANG Y, ZHAO X, et al. A generalized zero-shot framework for emotion recognition from body gestures[J]. arXiv: 2010. 06362, 2020. |
[173] |
GAO J, LI P, CHEN Z K, et al. A survey on deep learning for multimodal data fusion[J]. Neural Computation, 2020, 32(5): 829-864.
DOI URL |
[1] | AN Fengping, LI Xiaowei, CAO Xiang. Medical Image Classification Algorithm Based on Weight Initialization-Sliding Window CNN [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1885-1897. |
[2] | HUANG Hao, GE Hongwei. Deep Residual Expression Recognition Network to Enhance Inter-class Discrimination [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1842-1849. |
[3] | ZENG Fanzhi, XU Luqian, ZHOU Yan, ZHOU Yuexia, LIAO Junwei. Review of Knowledge Tracing Model for Intelligent Education [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1742-1763. |
[4] | HONG Huiqun, SHEN Guiping, HUANG Fenghua. Summary of Expression Recognition Technology [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(8): 1764-1778. |
[5] | LIU Yi, LI Mengmeng, ZHENG Qibin, QIN Wei, REN Xiaoguang. Survey on Video Object Tracking Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1504-1515. |
[6] | XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610. |
[7] | PENG Hao, LI Xiaoming. Multi-scale Selection Pyramid Networks for Small-Sample Target Detection Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1649-1660. |
[8] | ZHAO Yunji, FAN Cunliang, ZHANG Xinliang. Object Tracking Algorithm with Fusion of Multi-feature and Channel Awareness [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1417-1428. |
[9] | SUN Fangwei, LI Chengyang, XIE Yongqiang, LI Zhongbo, YANG Caidong, QI Jin. Review of Deep Learning Applied to Occluded Object Detection [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1243-1259. |
[10] | LIU Yafen, ZHENG Yifeng, JIANG Lingyi, LI Guohe, ZHANG Wenjie. Survey on Pseudo-Labeling Methods in Deep Semi-supervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1279-1290. |
[11] | LIU Ying, WANG Zhe, FANG Jie, ZHU Tingge, LI Linna, LIU Jiming. Multi-modal Public Opinion Analysis Based on Image and Text Fusion [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1260-1278. |
[12] | CHENG Weiyue, ZHANG Xueqin, LIN Kezheng, LI Ao. Deep Convolutional Neural Network Algorithm Fusing Global and Local Features [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1146-1154. |
[13] | ZHONG Mengyuan, JIANG Lin. Review of Super-Resolution Image Reconstruction Algorithms [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 972-990. |
[14] | ZHAO Pengfei, XIE Linbo, PENG Li. Deep Small Object Detection Algorithm Integrating Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 927-937. |
[15] | PEI Lishen, ZHAO Xuezhuan. Survey of Collective Activity Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 775-790. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/