多模态的情感分析技术综述

doi:10.3778/j.issn.1673-9418.2012075

摘要/Abstract

摘要：

情感分析是指利用计算机自动分析确定人们所要表达的情感，其在人机交互和刑侦破案等领域都能发挥重大作用。深度学习和传统特征提取算法的进步为利用多种模态进行情感分析提供了条件。结合多种模态进行情感分析可以弥补单模态情感分析的不稳定性以及局限性等缺点，能够有效提高准确度。近年来，研究者多用面部表情信息、文本信息以及语音信息三种模态进行情感分析。主要从这三种模态对多模态情感分析技术进行综述：首先对多模态情感分析的基本概念以及研究现状进行简要介绍；其次总结了常用的多模态情感分析数据集；然后分别对现有的基于面部表情信息、文本信息和语音信息的单模态情感分析技术进行简要叙述；接下来详细介绍了模态融合技术，并依据不同的模态融合方式对多模态情感分析技术的现有成果进行重点描述；最后讨论了多模态情感分析存在的问题以及未来的发展方向。

关键词: 多模态, 情感分析, 模态融合

Abstract:

Sentiment analysis refers to the use of computers to automatically analyze and determine the emotions that people want to express. It can play a significant role in human-computer interaction and criminal investigation and solving cases. The advancement of deep learning and traditional feature extraction algorithms provides conditions for the use of multiple modalities for sentiment analysis. Combining multiple modalities for sentiment analysis can make up for the instability and limitations of single-modal sentiment analysis, and can effectively improve accuracy. In recent years, researchers have used three modalities of facial expression information, text information, and voice information to perform sentiment analysis. This paper mainly summarizes the multi-modal sentiment analysis technology from these three modalities. Firstly, it briefly introduces the basic concepts and research status of multi-modal sentiment analysis. Secondly, it summarizes the commonly used multi-modal sentiment analysis datasets. It gives a brief description of the existing single-modal emotion analysis technology based on facial expression information, text information and voice information. Next, the modal fusion technology is introduced in detail, and the existing results of the multi-modal sentiment analysis technology are mainly described according to different modal fusion methods. Finally, it discusses the problems of multi-modal sentiment analysis and future development direction.

Key words: multi-modal, sentiment analysis, modal fusion

刘继明, 张培翔, 刘颖, 张伟东, 房杰. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.

LIU Jiming, ZHANG Peixiang, LIU Ying, ZHANG Weidong, FANG Jie. Summary of Multi-modal Sentiment Analysis Technology[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1165-1182.

参考文献

[1] PLUTCHIK R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their com-plexity and provide tools for clinical practice[J]. American Scientist, 2001, 89(4): 344-350.
[2] EKMAN P, FRIESEN W V. Constants across cultures in the face and emotion[J]. Journal of Personality and Social Psycho-logy, 1971, 17(2): 124.
[3] PORIA S, CAMBRIA E, BAJPAI R, et al. A review of affective computing: from unimodal analysis to multimodal fusion[J]. Information Fusion, 2017, 37: 98-125.
[4] PENG X J. Multi-modal affective computing: a comprehensive survey[J]. Journal of Hengyang Normal University, 2018, 39(3): 31-36.
彭小江. 基于多模态信息的情感计算综述[J]. 衡阳师范学院学报, 2018, 39(3): 31-36.
[5] SOLEYMANI M, GARCIA D, JOU B, et al. A survey of mul-timodal sentiment analysis[J]. Image and Vision Computing, 2017, 65: 3-14.
[6] HUDDAR M G, SANNAKKI S S, RAJPUROHIT V S. A survey of computational approaches and challenges in multi-modal sentiment analysis[J]. International Journal of Com-puter Sciences and Engineering, 2019, 7(1): 876-883.
[7] GAO J, LI P, CHEN Z, et al. A survey on deep learning for multimodal data fusion[J]. Neural Computation, 2020, 32(5): 829-864.
[8] ZHENG W L, LU B L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks[J]. IEEE Transactions on Autonomous Mental Development, 2015, 7(3): 162-175.
[9] LI S, DENG W. Deep facial expression recognition: a survey[J]. IEEE Transactions on Affective Computing, 2020.
[10] ZHANG Y, LAI G, ZHANG M, et al. Explicit factor models for explainable recommendation based on phrase-level senti-ment analysis[C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Informa-tion Retrieval, Queensland, Jul 6-11, 2014. New York: ACM, 2014: 83-92.
[11] XU N, MAO W J, CHEN G D. Multi-interactive memory network for aspect based multimodal sentiment analysis[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence, Hawaii, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 371-378.
[12] KOELSTRA S, MUHL C, SOLEYMANI M, et al. DEAP: a database for emotion analysis; using physiological signals[J]. IEEE Transactions on Affective Computing, 2011, 3(1): 18-31.
[13] YU W M, XU H, MENG F Y, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3718-3727.
[14] MORENCY L P, MIHALCEA R, DOSHI P. Towards multi-modal sentiment analysis: harvesting opinions from the Web[C]//Proceedings of the 13th International Conference on Multimodal Interfaces, Alicante, Nov 14-18, 2011. New York: ACM, 2011: 169-176.
[15] W?LLMER M, WENINGER F, KNAUP T, et al. Youtube movie reviews: sentiment analysis in an audio-visual con-text[J]. IEEE Intelligent Systems, 2013, 28(3): 46-53.
[16] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multi-modal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv:1606.06259, 2016.
[17] ELLIS J G, JOU B, CHANG S F. Why we watch the news: a dataset for exploring sentiment in broadcast video news[C]//Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul, Nov 12-16, 2014. New York: ACM, 2014: 104-111.
[18] BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: inter-active emotional dyadic motion capture database[J]. Language Resources and Evaluation, 2008, 42(4): 335-359.
[19] FENG X Y, HUANG D, CUI S X, et al. Spatial-temporal attention network for facial expression recognition[J]. Journal of Northwest University (Natural Science Edition), 2020, 50(3): 319-327.
冯晓毅, 黄东, 崔少星, 等. 基于空时注意力网络的面部表情识别[J]. 西北大学学报(自然科学版), 2020, 50(3): 319-327.
[20] LU J H, ZHANG S M, ZHAO J L. Facial expression reco-gnition based on CNN ensemble[J]. Journal of Qingdao University (Engineering & Technology Edition), 2020, 35(2): 24-29.
陆嘉慧, 张树美, 赵俊莉. 基于CNN集成的面部表情识别[J]. 青岛大学学报(工程技术版), 2020, 35(2): 24-29.
[21] LI X L, NIU H T. Facial expression recognition using feature fusion based on VGG-NET[J]. Computer Engineering & Science, 2020, 42(3): 500-509.
李校林, 钮海涛. 基于VGG-NET的特征融合面部表情识别[J]. 计算机工程与科学, 2020, 42(3): 500-509.
[22] LUO Y, ZHU L Z, LU B L. A GAN-based data augmentation method for multimodal emotion recognition[C]//LNCS 11554: Proceedings of the 16th International Symposium on Neural Networks, Moscow, Jul 10-12, 2019. Cham: Springer, 2019: 141-150.
[23] WANG K, PENG X, YANG J, et al. Suppressing uncertainties for large-scale facial expression recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 6897-6906.
[24] LI S, DENG W H. A deeper look at facial expression dataset bias[J]. IEEE Transactions on Affective Computing, 2020.
[25] DAI R. Facial recognition method based on facial physiological features and deep learning[J]. Journal of Chongqing Univer-sity of Technology (Natural Science), 2020, 34(6): 146-153.
戴蓉. 基于面部生理特征和深度学习的表情识别方法[J]. 重庆理工大学学报(自然科学), 2020, 34(6): 146-153.
[26] YADOLLAHI A, SHAHRAKI A G, ZAIANE O R. Current state of text sentiment analysis from opinion to emotion mining[J]. ACM Computing Surveys, 2017, 50(2): 1-33.
[27] ARAQUE O, ZHU G, IGLESIAS C A. A semantic similarity-based perspective of affect lexicons for sentiment analysis[J]. Knowledge-Based Systems, 2019, 165: 346-359.
[28] ZHAO Y Y, QIN B, LIU T. Sentiment analysis[J]. Journal of Software, 2010, 21(8): 1834-1848.
赵妍妍, 秦兵, 刘挺. 文本情感分析[J]. 软件学报, 2010, 21(8): 1834-1848.
[29] CHEN F, YUAN Z, HUANG Y. Multi-source data fusion for aspect-level sentiment classification[J]. Knowledge-Based Systems, 2020, 187: 104831.
[30] LI Z, FAN Y, JIANG B, et al. A survey on sentiment analysis and opinion mining for social multimedia[J]. Multimedia Tools and Applications, 2019, 78(6): 6939-6967.
[31] PANG B, LEE L, VAITHYANATHAN S. Thumbs up? Senti-ment classification using machine learning techniques[C]//Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Philadelphia, Jul 6-7, 2002. New York: ACM, 2002: 79-86.
[32] LI T T, JI D H. Sentiment analysis of micro-blog based on SVM and CRF using various combinations of features[J]. Application Research of Computers, 2015, 32(4): 978-981.
李婷婷, 姬东鸿. 基于 SVM 和 CRF 多特征组合的微博情感分析[J]. 计算机应用研究, 2015, 32(4): 978-981.
[33] PRUSA J D, KHOSHGOFTAAR T M, DITTMAN D J. Using ensemble learners to improve classifier performance on Tweet sentiment data[C]//Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration, San Francisco, Aug 13-15, 2015: 252-257.
[34] CHEN J Y, YAN S K, WONG K C. Verbal aggression detec-tion on Twitter comments: convolutional neural network for short-text sentiment analysis[J]. Neural Computing and App-lications, 2020, 32(15): 10809-10818.
[35] CHEN K, LIANG B, KE W D, et al. Chinese micro-blog sentiment analysis based on multi-channels convolutional neural networks[J]. Journal of Computer Research and Deve-lopment, 2018, 55(5): 945-957.
陈珂, 梁斌, 柯文德, 等. 基于多通道卷积神经网络的中文微博情感分析[J]. 计算机研究与发展, 2018, 55(5): 945-957.
[36] ZHU Y, CHEN S P. Commentary text sentiment analysis combining convolution neural network and attention[J]. Journal of Chinese Computer Systems, 2020, 41(3): 551-557.
朱烨, 陈世平. 融合卷积神经网络和注意力的评论文本情感分析[J]. 小型微型计算机系统, 2020, 41(3): 551-557.
[37] CAO Y, LI T R, JIA Z, et al. BGRU: new method of Chinese text sentiment analysis[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6): 973-981.
曹宇, 李天瑞, 贾真, 等. BGRU: 中文文本情感分析的新方法[J]. 计算机科学与探索, 2019, 13(6): 973-981.
[38] WANG X, JIANG W, LUO Z Y. Combination of convolutional and recurrent neural network for sentiment analysis of short texts[C]//Proceedings of the 26th International Conference on Computational Linguistics, Osaka, Dec 11-16, 2016. Strou-dsburg: ACL, 2016: 2428-2437.
[39] LI Y, PAN Q, WANG S, et al. A generative model for category text generation[J]. Information Sciences, 2018, 450: 301-315.
[40] PAO T L, CHEN Y T, YEH J H, et al. Detecting emotions in Mandarin speech[J]. International Journal of Computational Linguistics & Chinese Language Processing, 2005, 10(3): 347-362.
[41] LI Y C, ISHI C T, WARD N, et al. Emotion recognition by combining prosody and sentiment analysis for expressing reactive emotion by humanoid robot[C]//Proceedings of the 2017 Asia-Pacific Signal and Information Processing Asso-ciation Annual Summit and Conference, Kuala Lumpur, Dec 12-15, 2017. Piscataway: IEEE, 2017: 1356-1359.
[42] SEMWAL N, KUMAR A, NARAYANAN S. Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models[C]//Proceedings of the 2017 IEEE International Conference on Identity, Security and Behavior Analysis, New Delhi, Feb 22-24, 2017. Piscataway: IEEE, 2017: 1-6.
[43] SAMANTARAY A K, MAHAPATRA K, KABI B, et al. A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages[C]//Proceedings of the 2nd IEEE International Conference on Recent Trends in Information Systems, Kolkata, Jul 9-11, 2015. Piscataway: IEEE, 2015: 372-377.
[44] HUANG Z W, DONG M, MAO Q R, et al. Speech emotion recognition using CNN[C]//Proceedings of the 2014 ACM International Conference on Multimedia, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 801-804.
[45] REN Z, JIA J, GUO Q, et al. Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data[C]//Proceedings of the 2014 IEEE International Conference on Multimedia and Expo, Chengdu, Jul 14-18, 2014. Piscataway: IEEE, 2014: 1-4.
[46] WU L, OVIATT S L, COHEN P R. Multimodal integration—a statistical view[J]. IEEE Transactions on Multimedia, 1999, 1(4): 334-341.
[47] ZHANG C, YANG Z C, HE X D, et al. Multimodal intel-ligence: representation learning, information fusion, and app-lications[J]. IEEE Journal of Selected Topics in Signal Pro-cessing, 2020, 14(3): 478-493.
[48] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Com-munications of the ACM, 2017, 60(6): 84-90.
[49] SUN Y Y, JIA Z T, ZHU H Y. Survey of multimodal deep learning[J]. Computer Engineering and Applications, 2020, 56(21): 1-10.
孙影影，贾振堂，朱昊宇. 多模态深度学习综述[J]. 计算机工程与应用, 2020, 56(21): 1-10.
[50] PéREZ-ROSAS V, MIHALCEA R, MORENCY L P. Utterance-level multimodal sentiment analysis[C]//Procee-dings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Aug 4-9, 2013. Stroudsburg: ACL, 2013: 973-982.
[51] PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 873-883.
[52] DENG D, ZHOU Y, PI J, et al. Multimodal utterance-level affect analysis using visual, audio and text features[J]. arXiv:1805.00625, 2018.
[53] PORIA S, CHATURVEDI I, CAMBRIA E, et al. Convolu-tional MKL based multimodal emotion recognition and senti-ment analysis[C]//Proceedings of the 16th International Con-ference on Data Mining, Barcelona, Dec 12-15, 2016. Piscata-way: IEEE, 2016: 439-448.
[54] HU T T, SHEN L J, FENG Y Q, et al. Research on anger and happy misclassification in speech and text emotion recognition[J]. Computer Technology and Development, 2018, 28(11): 124-127.
胡婷婷, 沈凌洁, 冯亚琴, 等. 语音与文本情感识别中愤怒与开心误判分析[J]. 计算机技术与发展, 2018, 28(11): 124-127.
[55] CHEN F, LUO Z, XU Y, et al. Complementary fusion of multi-features and multi-modalities in sentiment analysis[J]. arXiv:1904.08138, 2019.
[56] KUMAR A, VEPA J. Gated mechanism for attention based multi modal sentiment analysis[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, May 4-8, 2020. Piscataway: IEEE, 2020: 4477-4481.
[57] XU N, MAO W J. MultiSentiNet: a deep semantic network for multimodal sentiment analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, Nov 6-10, 2017. New York: ACM, 2017: 2399-2402.
[58] YU J, JIANG J, XIA R. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 28: 429-439.
[59] YU Y, LIN H, MENG J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41.
[60] PORIA S, CAMBRIA E, GELBUKH A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis[C]//Procee-dings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015. Stroudsburg: ACL, 2015: 2539-2544.
[61] WANG H H, MEGHAWAT A, MORENCY L P, et al. Select-additive learning: improving generalization in multimodal sentiment analysis[C]//Proceedings of the 2017 IEEE Inter-national Conference on Multimedia and Expo, Hong Kong, China, Jul 10-14, 2017. Washington: IEEE Computer Society, 2017: 949-954.
[62] YU H L, GUI L K, MADAIO M, et al. Temporally selective attention model for social and affective state recognition in multimedia content[C]//Proceedings of the 25th ACM Inter-national Conference on Multimedia, Mountain View, Oct 23-27, 2017. New York: ACM, 2017: 1743-1751.
[63] CHEN M H, WANG S, LIANG P P, et al. Multimodal sentiment analysis with word-level fusion and reinforce-ment learning[C]//Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, Nov 13-17, 2011. New York: ACM, 2017: 163-171.
[64] SHENOY A, SARDANA A. Multilogue-Net: a context aware RNN for multi-modal emotion detection and sentiment analysis in conversation[J]. arXiv:2002.08267, 2020.
[65] CIMTAY Y, EKMEKCIOGLU E, CAGLAR-OZHAN S. Cross-subject multimodal emotion recognition based on hybrid fusion[J]. IEEE Access, 2020, 8: 168865-168878.
[66] GUNES H, PICCARDI M. Bi-modal emotion recognition from expressive face and body gestures[J]. Journal of Network and Computer Applications, 2007, 30(4): 1334-1345.
[67] FIéRREZ-AGUILAR J, ORTEGA-GARCIA J, GONZALEZ-RODRIGUEZ J. Fusion strategies in multimodal biometric verification[C]//Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, Baltimore, Jul 6-9, 2003. Piscataway: IEEE, 2003: 5-8.
[68] JIANG T, WANG J H, LIU Z Y, et al. Fusion-extraction network for multimodal sentiment analysis[C]//LNCS 12085: Proceedings of the 24th Pacific-Asia Conference on Know-ledge Discovery and Data Mining, Singapore, May 11-14, 2020. Cham: Springer, 2020: 785-797.
[69] JIANG D, ZOU D, DENG Z, et al. Contextual multimodal sentiment analysis with information enhancement[J]. Journal of Physics: Conference Series, 2020, 1453(1): 012159.
[70] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017.
[71] VERMA S, WANG J, GE Z, et al. Deep-HOSeq: deep higher order sequence fusion for multimodal sentiment analysis[J]. arXiv:2010.08218, 2020.
[72] VIELZEUF V, LECHERVY A, PATEUX S, et al. CentralNet: a multilayer approach for multimodal fusion[C]//LNCS 11134: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 575-589.
[73] MAJUMDER N, HAZARIKA D, GELBUKH A F, et al. Multimodal sentiment analysis using hierarchical fusion with context modeling[J]. Knowledge Based Systems, 2018, 161: 124-133.

编辑推荐 0

Metrics

阅读次数

全文

1609

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	1609

来源	本网站	其他网站

次数	1174	435
比例	73%	27%

摘要

2122

最新录用	在线预览	正式出版

0	0	2122

	来源	本网站

	次数	2122
	比例	100%