Multi-modal Public Opinion Analysis Based on Image and Text Fusion

doi:10.3778/j.issn.1673-9418.2110056

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (6): 1260-1278.DOI: 10.3778/j.issn.1673-9418.2110056

• Surveys and Frontiers • Previous Articles Next Articles

Multi-modal Public Opinion Analysis Based on Image and Text Fusion

LIU Ying¹^,²^,⁺(), WANG Zhe¹, FANG Jie¹^,², ZHU Tingge¹^,², LI Linna³, LIU Jiming⁴

1. Center for Image and Information Processing, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2. Key Laboratory of Electronic Information Application Technology for Crime Scene Investigation, Ministry of Public Security, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
3. Network Public Opinion Monitoring and Analysis Center, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
4. School of Communications and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

Received:2021-10-22 Revised:2022-01-12 Online:2022-06-01 Published:2022-06-20
About author:LIU Ying, born in 1972, Ph.D., professor, M.S. supervisor. Her research interests include image retrieval, image clarity, etc.
WANG Zhe, born in 1994, M.S. candidate. His research interest is multi-modal public opinion analysis based on image and text fusion.
FANG Jie, born in 1993, Ph.D., associate professor. His research interest is semantic understanding of visual image and its application.
ZHU Tingge, born in 1976, Ph.D. Her research interest is image information security.
LI Linna, born in 1980, director of Network Public Opinion Monitoring and Analysis Center of Xi’an University of Posts and Telecommunications. Her research interest is network public opinion analysis.
LIU Jiming, born in 1964, Ph.D., distinguished professor at Xi’an University of Posts and Telecommunications. His research interests include artificial intelligence technology and its industrialization.
Supported by:
Science and Technology Project under Ministry of Public Security of China(2019GABJC41)

基于图文融合的多模态舆情分析

刘颖¹^,²^,⁺(), 王哲¹, 房杰¹^,², 朱婷鸽¹^,², 李琳娜³, 刘继明⁴

1. 西安邮电大学图像与信息处理研究所,西安 710121
2. 西安邮电大学电子信息现场勘验应用技术公安部重点实验室,西安 710121
3. 西安邮电大学网络舆情监测与分析中心,西安 710121
4. 西安邮电大学通信与信息工程学院,西安 710121

通讯作者: + E-mail: liuying_ciip@163.com
作者简介:刘颖（1972—）,女,陕西户县人,博士,教授,硕士生导师,电子信息现场勘验应用技术公安部重点实验室总工程师,主要研究方向为图像检索、图像清晰化等。
王哲（1994—）,男,辽宁鞍山人,硕士研究生,主要研究方向为基于图文融合的多模态舆情分析。
房杰（1993—）,男,陕西咸阳人,博士,副教授,主要研究方向为视觉影像的语义理解及其应用。
朱婷鸽（1976—）,女,陕西杨凌人,博士,主要研究方向为图像信息安全。
李琳娜（1980—）,女,陕西西安人,西安邮电大学网络舆情监测与分析中心主任,主要研究方向为网络舆情分析。
刘继明（1964—）,男,福建龙岩人,博士,网经科技（苏州）有限公司董事长,西安邮电大学特聘教授,主要研究方向为人工智能技术及其产业化。
基金资助:
公安部科技强警项目(2019GABJC41)

Abstract

Abstract:

Due to the continuous popularization of the Internet and mobile phones, people have gradually entered a participatory network era. More and more people like to publish their opinions, comments and emotions through text and image on the Internet. Effective analysis of these text and image information can not only help companies better improve the quality of their products, but also provide guidance for government decision-making and social production and life. This paper summarizes the sentiment analysis of online public opinion based on multi-modal image and text fusion. Firstly, it summarizes the basic concepts of public opinion analysis. Secondly, it explains the process of single-modal text and visual sentiment analysis on social media. Thirdly, it summarizes the public opinion analysis algorithms based on image and text fusion, and divides the algorithms into feature layer fusion, decision layer fusion and linear regression model according to different fusion strategies. In addition, it summarizes the commonly used multi-modal sentiment analysis for social media dataset. Finally, the difficulties of online opinion analysis and future research directions are discussed.

Key words: network public opinion analysis, image and text fusion, sentiment analysis, multi-modal

摘要：

由于互联网以及移动手机的不断普及,人们逐渐进入到一个参与式的网络时代,越来越多的人们喜欢在网络上通过文本和图像的方式发布自己的观点、评论以及情感。对于这些文本和图像信息进行有效分析,不仅可以帮助企业更好地提高产品的质量,而且有利于为政府决策和社会生产生活提供指导。对基于多模态图文融合的网络舆情情感分析进行了综述。首先对舆情分析的基本概念进行了概括;其次对社交媒体上单模态的文本和视觉舆情情感分析的过程进行了说明;然后对基于图文融合的舆情分析算法进行了总结,并按照不同融合策略,将其分为特征层融合、决策层融合和线性回归模型;另外总结了针对社交媒体的多模态情感分析的常用数据集;最后讨论了网络舆情分析的难点以及未来研究方向。

关键词: 网络舆情分析, 图文融合, 情感分析, 多模态

CLC Number:

TP391

LIU Ying, WANG Zhe, FANG Jie, ZHU Tingge, LI Linna, LIU Jiming. Multi-modal Public Opinion Analysis Based on Image and Text Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1260-1278.

刘颖, 王哲, 房杰, 朱婷鸽, 李琳娜, 刘继明. 基于图文融合的多模态舆情分析[J]. 计算机科学与探索, 2022, 16(6): 1260-1278.

Figures/Tables 22

References 89

[1]	龚韶, 刘兴均. 网络舆情安全监测语义识别研究综述[J]. 网络安全技术与应用, 2019(7): 52-57.
	GONG S, LIU X J. A review of research on semantic recog-nition of network public opinion security monitoring[J]. Net-work Security Technology and Application, 2019(7): 52-57.
[2]	NASUKAWA T, YI J. Sentiment analysis: capturing favora-bility using natural language processing[C]// Proceedings of the 2nd International Conference on Knowledge Capture, Florida, Oct 23-25, 2003. New York: ACM, 2003: 70-77.
[3]	PANG B, LEE L. Opinion mining and sentiment analysis[J]. Computational Linguistics, 2009, 35(2): 311-312. DOI URL
[4]	SASIKALA D, SUKUMARAN S. A survey on lexicon and machine learning based classification methods for sentimental analysis[J]. International Journal of Research and Analytical Reviews, 2019, 6(2): 256-259.
[5]	PRABHA M I, SRIKANTH G U. Survey of sentiment anal-ysis using deep learning techniques[C]// Proceedings of the 2019 1st International Conference on Innovations in Infor-mation and Communication Technology, Chennai, Apr 25-26, 2019. Piscataway: IEEE, 2019: 1-9.
[6]	TEDMORI S, AWAJAN A. Sentiment analysis main tasks and applications: a survey[J]. Journal of Information Proce-ssing Systems, 2019, 15(3): 500-519.
[7]	WANG J D, LIU Z H, MENG F Q. Analysis on the situa-tion of internet public opinion research[C]// Proceedings of the 2020 12th International Conference on Computational Intelligence and Communication Networks, Bhimtal, Sep 25-26, 2020. Piscataway: IEEE, 2020: 416-422.
[8]	ALLAN J, FENG A, BOLIVAR A. Flexible intrinsic evalua-tion of hierarchical clustering for TDT[C]// Proceedings of the 12th International Conference on Information and Knowledge Management, New Orleans, Nov 3-8, 2003. New York: ACM, 2003: 263-270.
[9]	HUGHES A L, PALEN L. Twitter adoption and use in mass convergence and emergency events[J]. International Journal of Eemergency Management, 2009, 6(3/4): 248-260.
[10]	刘英杰. 网络舆情的信息情感维度空间构建和信息情感元识别研究[D]. 哈尔滨: 哈尔滨工业大学, 2012.
	LIU Y J. Research on the construction of information senti-ment dimension space and information sentiment meta-recognition of network public opinion[D]. Harbin: Harbin Institute of Technology, 2012.
[11]	黄微, 李瑞, 孟佳林. 大数据环境下多媒体网络舆情传播要素及运行机理研究[J]. 图书情报工作, 2015, 59(21): 38-44.
	HUANG W, LI R, MENG J L. Study on dissemination ele-ments and operational mechanism of multimedia network public opinion under the big data environment[J]. Library and Information Service, 2015, 59(21): 38-44.
[12]	梁军, 柴玉梅, 原慧斌, 等. 基于深度学习的微博情感分析[J]. 中文信息学报, 2014, 28(5): 155-161.
	LIANG J, CHAI Y M, YUAN H B, et al. Deep learning for Chinese micro-blog sentiment analysis[J]. Journal of Chinese Information Processing, 2014, 28(5): 155-161.
[13]	崔彦琛, 张鹏, 兰月新, 等. 消防突发事件网络舆情情感词典构建研究[J]. 情报杂志, 2018, 37(10): 154-160.
	CUI Y C, ZHANG P, LAN Y X, et al. On the construction of sentiment lexicon for emergency network public opinion of fire services[J]. Journal of Intelligence, 2018, 37(10): 154-160.
[14]	BROB J, EHRIG H. Terminology extraction approaches for product aspect detection in customer reviews[C]// Proceedings of the 17th ACL Conference on Computational Natural Lan-guage Learning, Sofia, Aug 8-9, 2013. Stroudsburg: ACL, 2013: 222-230.
[15]	ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches: a survey[J]. Information Fusion, 2021, 76: 204-226. DOI URL
[16]	ZHANG L, WANG S, LIU B. Deep learning for sentiment analysis: a survey[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018, 8(4): e1253.
[17]	PENINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proce-ssing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1532-1543.
[18]	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient es-timation of word representations in vector space[J]. arXiv:1301.3781, 2013.
[19]	HAMID B, ISLAM M J. Sentiment analysis of Twitter data[J]. arXiv:1711.10377, 2017.
[20]	SAIF H, HE Y, FERNÁNDEZ M, et al. Contextual semantics for sentiment analysis of Twitter[J]. Information Processing & Management, 2016, 52(1): 5-19. DOI URL
[21]	THELWALL M, BUCKLEY K, PALTOGLOU G. Sentiment strength detection for the social web[J]. Journal of the Asso-ciation for Information Science & Technology, 2012, 63(1): 163-173.
[22]	杨盛春, 贾林祥. 神经网络内监督学习和无监督学习之比较[J]. 徐州建筑职业技术学院学报, 2006, 6(3): 55-58.
	YANG S C, JIA L X. Comparison between supervised lear-ning and unsupervised learning in neural networks[J]. Journal of Xuzhou Institute of Architectural Technology, 2006, 6(3): 55-58.
[23]	李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J]. 计算机应用研究, 2015, 32(4): 978-981.
	LI T T, JI D H. Sentiment analysis of micro-blog based on SVM and CRF using various combinations of features[J]. Application Research of Computers, 2015, 32(4): 978-981.
[24]	STOJANOVSKI D, STREZOSKI G, MADJAROV G, et al. Deep neural network architecture for sentiment analysis and emotion identification of Twitter messages[J]. Multimedia Tools and Applications, 2018, 77(24): 32213-32242. DOI URL
[25]	孙中锋, 王静. 用于基于方面情感分析的RCNN-BGRU-HN网络模型[J]. 计算机科学, 2019, 46(9): 223-228.
	SUN Z F, WANG J. RCNN-BGRU-HN network model for aspect-based sentiment analysis[J]. Computer Science, 2019, 46(9): 223-228.
[26]	CHEN S S, DING Y D, XIE Z F, et al. Chinese Weibo senti-ment analysis based on character embedding with dual-channel convolutional neural network[C]// Proceedings of the 3rd Inter-national Conference on Cloud Computing and Big Data Anal-ysis, Chengdu, Apr 20-22, 2018. Piscataway: IEEE, 2018: 107-111.
[27]	裴颂文, 王露露. 基于注意力机制的文本情感倾向性研究[J]. 计算机工程与科学, 2019, 41(2): 344-353.
	PEI S W, WANG L L. Text sentiment analysis based on atte-ntion mechanism[J]. Computer Engineering and Science, 2019, 41(2): 344-353.
[28]	XING B, LIAO L, SONG D, et al. Earlier attention? aspect-aware LSTM for aspect-based sentiment analysis[J]. arXiv:1905.07719, 2019.
[29]	罗帆, 王厚峰. 结合RNN和CNN层次化网络的中文文本情感分类[J]. 北京大学学报(自然科学版), 2018, 54(3): 459-465.
	LUO F, WANG H F. Chinese text sentiment classification by H-RNN-CNN[J]. Acta Scientiarum Naturalium Univer-sitatis Pekinensis, 2018, 54(3): 459-465.
[30]	REHMANA U A, MALIK A K, RAZA B, et al. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis[J]. Multimedia Tools and Applications, 2019, 78(18): 26597-26613. DOI URL
[31]	LIU Z X, ZHANG D G, LUO G Z, et al. A new method of emotional analysis based on CNN-BiLSTM hybrid neural network[J]. Cluster Computing, 2020, 23(4): 2901-2913. DOI URL
[32]	WU L F, QI M C, JIAN M, et al. Visual sentiment analysis by combining global and local information[J]. Neural Proce-ssing Letters, 2020, 51(3): 2063-2075.
[33]	NAVAZ A N, SERHANI M A, MATHEW S S. Facial image pre-processing and emotion classification: a deep learning approach[C]// Proceedings of the 16th IEEE/ACS Interna-tional Conference on Computer Systems and Applications, Abu Dhabi, Nov 3-7, 2019. Washington: IEEE Computer So-ciety, 2019: 1-8.
[34]	PRIYA D T, UDAYAN J D. Transfer learning techniques for emotion classification on visual features of images in the deep learning network[J]. International Journal of Speech Techno-logy, 2020, 23(2): 361-372.
[35]	YUAN J B, MCDONOUGH S, YOU Q Z, et al. Sentribute: image sentiment analysis from a mid-level perspective[C]// Proceedings of the 2nd International Workshop on Issues of Sentiment Discovery and Opinion Mining, Chicago, Aug 11, 2013. New York: ACM, 2013: 1-8.
[36]	BORTH D, JI R R, CHEN T, et al. Large-scale visual senti-ment ontology and detectors using adjective noun pairs[C]// Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Oct 21-25, 2013. New York: ACM, 2013: 223-232.
[37]	BORTH D, CHRN T, JI R R, et al. SentiBank: large-scale onto-logy and classifiers for detecting sentiment and emotions in visual content[C]// Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Oct 21-25, 2013. New York: ACM, 2013: 459-460.
[38]	CHEN T, YU F X, CHEN J W, et al. Object-based visual sentiment concept analysis and application[C]// Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, Nov 3-7, 2014. New York: ACM, 2014: 367-376.
[39]	CAO D L, JI R R, LIN D Z, et al. Visual sentiment topic model based microblog image sentiment analysis[J]. Multimedia Tools and Applications, 2016, 75(15): 8955-8968. DOI URL
[40]	LI Z H, FAN Y Y, LIU W H, et al. Image sentiment predi-ction based on textual descriptions with adjective noun pairs[J]. Multimedia Tools and Applications, 2018, 77(1): 1115-1132. DOI URL
[41]	LI Z H, FAN Y Y, JIANG B, et al. A survey on sentiment analysis and opinion mining for social multimedia[J]. Multi-media Tools and Applications, 2019, 78(6): 6939-6967.
[42]	YANG J F, SUN M, SUN X X. Learning visual sentiment distributions via augmented conditional probability neural network[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 224-230.
[43]	YANG J, SHE D, SUN M. Joint image emotion classifica-tion and distribution learning via deep convolutional neural network[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 3266-3272.
[44]	CAMPOS V, JOU B, GIRÓ-I-NIETO X. From pixels to senti-ment: fine-tuning CNNs for visual sentiment prediction[J]. Image and Vision Computing, 2017, 65: 15-22. DOI URL
[45]	CHEN T, BORTH D, DARRELL T, et al. DeepSentiBank: visual sentiment concept classification with deep convolu-tional neural networks[J]. arXiv:1410.8586, 2014.
[46]	AHSAN U, DE CHOUDHURY M, ESSA I A. Towards using visual attributes to infer image sentiment of social events[C]// Proceedings of the 2017 International Joint Conference on Neural Networks, Anchorage, May 14-19, 2017. Piscata-way: IEEE, 2017: 1372-1379.
[47]	TANG D Y, QIN B, LIU T. Deep learning for sentiment anal-ysis: successful approaches and future challenges[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Disco-very, 2015, 5(6): 292-303.
[48]	蔡国永, 夏彬彬. 基于卷积神经网络的图文融合媒体情感预测[J]. 计算机应用, 2016, 36(2): 428-431.
	CAI G Y, XIA B B. Multimedia sentiment analysis based on convolutional neural network[J]. Journal of Computer Applica-tions, 2016, 36(2): 428-431.
[49]	申自强. 基于文本和图像的舆情分析方法研究[D]. 镇江: 江苏大学, 2018.
	SHEN Z Q. Research on public opinion analysis method based on text and image[D]. Zhenjiang: Jiangsu University, 2018.
[50]	WANG M, CAO D L, LI L X, et al. Microblog sentiment analysis based on cross-media bag-of-words model[C]// Pro-ceedings of the 2014 International Conference on Internet Multimedia Computing and Service, Xiamen, Jul 10-12, 2014. New York: ACM, 2014: 76-80.
[51]	ZHANG Y W, SHANG L, JIA X Y. Sentiment analysis on microblogging by integrating text and image features[C]// LNCS 9078: Proceedings of the 19th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Ho Chi Minh City, May 19-22, 2015. Cham: Springer, 2015: 52-63.
[52]	CAI G Y, XIA B B. Convolutional neural networks for multi-media sentiment analysis[C]// LNCS 9362: Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, Oct 9-13, 2015. Cham: Spri-nger, 2015: 159-167.
[53]	刘星. 融合局部语义信息的多模态舆情分析模型[J]. 信息安全研究, 2019, 5(4): 340-345.
	LIU X. Multimodal public sentiment analysis model based on local semantic information[J]. Journal of Information Security Research, 2019, 5(4): 340-345.
[54]	缪裕青, 汪俊宏, 刘同来, 等. 图文融合的微博情感分析方法[J]. 计算机工程与设计, 2019, 40(4): 1099-1105.
	MIAO Y Q, WANG J H, LIU T L, et al. Joint visual-textual approach for microblog sentiment analysis[J]. Computer Engi-neering and Design, 2019, 40(4): 1099-1105.
[55]	XU N, MAO W J, CHEN G D. A co-memory network for multimodal sentiment analysis[C]// Proceedings of the 41st International ACM SIGIR Conference on Research & Deve-lopment in Information Retrieval, Ann Arbor, Jul 8-12, 2018. New York: ACM, 2018: 929-932.
[56]	TRUONG Q T, LAUW H W. VistaNet: visual aspect atten-tion network for multimodal sentiment analysis[C]// Procee-dings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Con-ference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 305-312.
[57]	邓佩, 谭长庚. 基于转移变量的图文融合微博情感分析[J]. 计算机应用研究, 2018, 35(7): 2038-2041.
	DENG P, TAN C G. Multimedia sentiment analysis on micro-blog based on transition variable[J]. Application Research of Computers, 2018, 35(7): 2038-2041.
[58]	KATSURAI M, SATOH S. Image sentiment analysis using latent correlations among visual, textual, and sentiment views[C]// Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, Mar 20-25, 2016. Piscataway: IEEE, 2016: 2837-2841.
[59]	HUANG F, ZHANG X, ZHAO Z, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Knowledge-Based Systems, 2019, 167: 26-37. DOI URL
[60]	YU Y H, LIN H F, MENG J N, et al. Visual and textual senti-ment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41. DOI URL
[61]	YOU Q Z, LUO J B, JIN H L, et al. Cross-modality consis-tent regression for joint visual-textual sentiment analysis of social multimedia[C]// Proceedings of the 9th ACM Interna-tional Conference on Web Search and Data Mining, San Fran-cisco, Feb 22-25, 2016. New York: ACM, 2016: 13-22.
[62]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 26th Annual Conference on Neural Infor-mation Processing Systems, Lake Tahoe, Dec 3-6, 2012. Red Hook: Curran Associates, 2012: 1097-1105.
[63]	LE Q V, MIKOLOV T. Distributed representations of sente-nces and documents[C]// Proceedings of the 31st International Conference on Machine Learning, Beijing, Jun 21-26, 2014. Colorado: PMLR, 2014: 1188-1196.
[64]	CHEN F, JI R, SU J, et al. Predicting microblog sentiments via weakly supervised multimodal deep learning[J]. IEEE Tran-sactions on Multimedia, 2017, 20(4): 997-1007.
[65]	GAO Y, WANG M, TAO D, et al. 3-D object retrieval and reco-gnition with hypergraph analysis[J]. IEEE Transactions on Image Processing, 2012, 21(9): 4290-4303. DOI URL
[66]	HUANG Y, LIU Q, ZHANG S, et al. Image retrieval via proba-bilistic hypergraph ranking[C]// Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Soc-iety, 2010: 3376-3383.
[67]	GAO Y, WANG M, ZHA Z J, et al. Visual-textual joint rele-vance learning for tag-based social image search[J]. IEEE Transactions on Image Processing, 2012, 22(1): 363-376. DOI URL
[68]	ZHOU D, HUANG J, SCHÖLKOPF B. Learning with hyper-graphs:clustering, classification, and embedding[C]// Adva-nces in Neural Information Processing Systems 19: Procee-dings of the 2006 Annual Conference on Neural Informa-tion Processing Systems, Vancouver, Dec 4-7, 2006. Cambridge: MIT Press, 2006: 1601-1608.
[69]	JI R, CHEN F, CAO L, et al. Cross-modality microblog senti-ment prediction via bi-layer multimodal hypergraph learning[J]. IEEE Transactions on Multimedia, 2018, 21(4): 1062-1075. DOI URL
[70]	MOREMCY L P, MIHALCEA R, DOSHI P. Towards multi-modal sentiment analysis: harvesting opinions from the web[C]// Proceedings of the 13th International Conference on Multi-modal Interfaces, Alicante, Nov 14-18, 2011. New York: ACM, 2011: 169-176.
[71]	PORIA S, CHATURVEDI I, CAMBRIA E, et al. Convolu-tional MKL based multimodal emotion recognition and senti-ment analysis[C]// Proceedings of the 16th International Confe-rence on Data Mining, Barcelona, Dec 12-15, 2016. Washin-gton: IEEE Computer Society, 2016: 439-448.
[72]	YOU Q, CAO L, JIN H, et al. Robust visual-textual senti-ment analysis: when attention meets tree-structured recur-sive neural networks[C]// Proceedings of the 24th ACM Inter-national Conference on Multimedia, Amsterdam, Oct 15-19, 2016. New York: ACM, 2016: 1008-1017.
[73]	PORIA S, MAJUMDER N, HAZARIKA D, et al. Multimodal sentiment analysis: addressing key issues and setting up the baselines[J]. IEEE Intelligent Systems, 2018, 33(6): 17-25. DOI URL
[74]	XU J, HUANG F, ZHANG X, et al. Sentiment analysis of social images via hierarchical deep fusion of content and links[J]. Applied Soft Computing, 2019, 80: 387-399. DOI URL
[75]	PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online lear-ning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710.
[76]	XU N, MAO W J. MultiSentiNet: a deep semantic network for multimodal sentiment analysis[C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Mana-gement, Singapore, Nov 6-10, 2017. New York: ACM, 2017: 2399-2402.
[77]	YANG X L, XU S J, WU H, et al. Sentiment analysis of Weibo comment texts based on extended vocabulary and convolu-tional neural network[J]. Procedia Computer Science, 2019, 147: 361-368. DOI URL
[78]	WANG X F, SHU S, YAN L. Analyzing public opinion from microblog with topic clustering and sentiment intensity[J]. Data Analysis and Knowledge Discovery, 2018, 2(6): 37-47.
[79]	蔡国永, 吕光瑞, 徐智. 基于层次化深度关联融合网络的社交媒体情感分类[J]. 计算机研究与发展, 2019, 56(6): 1312-1324.
	CAI G Y, LV G R, XU Z. A hierarchical deep correlative fusion network for sentiment classification in social media[J]. Journal of Computer Research and Development, 2019, 56(6): 1312-1324.
[80]	ASGHAR N. Yelp dataset challenge: review rating prediction[J]. arXiv:1605.05362, 2016.
[81]	BOURLAI E, HERRING S C. Multimodal communication on Tumblr: “I have so many feels!”[C]// Proceedings of the 2014 ACM Web Science Conference, Bloomington, Jun 23-26, 2014. New York: ACM, 2014: 171-175.
[82]	NIU T, ZHU S, PANG L, et al. Sentiment analysis on multi-view social data[C]// LNCS 9517: Proceedings of the 22nd International Conference on Multimedia Modeling, Miami, Jan 4-6, 2016. Cham: Springer, 2016: 15-27.
[83]	YOUNG P, LAI A, HODOSH M, et al. From image descrip-tions to visual denotations: new similarity metrics for semantic inference over event descriptions[J]. Transactions of the Asso-ciation for Computational Linguistics, 2014, 2: 67-78.
[84]	LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]// Proceedings of the 56th Annual Meeting of the Association for Computa-tional Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 1990-1999.
[85]	ZHANG Q, FU J L, LIU X Y, et al. Adaptive co-attention net-work for named entity recognition in tweets[C]// Procee-dings of the 32nd AAAI Conference on Artificial Intelli-gence, the 30th Innovative Applications of Artificial Intelli-gence Conference, and the 8th AAAI Symposium on Educa-tional Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 5674-5681.
[86]	CAI Y T, CAI H Y, WAN X J. Multi-modal sarcasm detec-tion in Twitter with hierarchical fusion model[C]// Procee-dings of the 57th Conference of the Association for Compu-tational Linguistics, Florence, Jul 28-Aug 2, 2019. Strouds-burg: ACL, 2019: 2506-2515.
[87]	刘颖, 郭莹莹, 房杰, 等. 深度学习跨模态图文检索研究综述[J]. 计算机科学与探索, 2022, 16(3): 489-511.
	LIU Y, GUO Y Y, FANG J, et al. Survey of research on deep learning image-text cross-modal retrieval[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 489-511.
[88]	刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.
	LIU J M, ZHANG P X, LIU Y, et al. Summary of multi-modal sentiment analysis technology[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1165-1182.
[89]	VADICAMO L, CARRARA F, CIMINO A, et al. Cross-media learning for image sentiment analysis in the wild[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, Venice, Oct 22-29, 2017. Was-hington: IEEE Computer Society, 2017: 308-317.

模型	提出者	优点	不足
布尔模型	布尔	模型实现简单,运算效率高	遗漏文本中很多重要特征,故基本不采用
概率主题模型	Belkin、Croft	每个维度所代表的语义可以根据其主题表示出来,因此该模型具有可解释性	1. 模型的参数较多,因此需要较长的训练时间 2. 需要参与主题数目的设置,因此存在主观性
向量空间模型	Saltaon	向量更方便计算,通过计算可以度量文本之间的相似度	1. 当词库数量增大时,向量的维度越来越高,这使得向量高度稀疏化 2. 针对“一义多词”和“一词多义”问题,存在较大误差

模型	提出者	优点	不足
布尔模型	布尔	模型实现简单,运算效率高	遗漏文本中很多重要特征,故基本不采用
概率主题模型	Belkin、Croft	每个维度所代表的语义可以根据其主题表示出来,因此该模型具有可解释性	1. 模型的参数较多,因此需要较长的训练时间 2. 需要参与主题数目的设置,因此存在主观性
向量空间模型	Saltaon	向量更方便计算,通过计算可以度量文本之间的相似度	1. 当词库数量增大时,向量的维度越来越高,这使得向量高度稀疏化 2. 针对“一义多词”和“一词多义”问题,存在较大误差

任务	数据集	评论数（文本）	图片数	推文数（图文）	是否有情感标注	模态	标签	来源
面向图文情感分析	Yelp	44 305	244 569	—	是	图文	情感五分类	Yelp
	Tumblr	—	—	256 897	是	图文	15种情绪	Tumblr
	MVSA	—	—	2 592	是	图文	情感三分类	Twitter
面向图文的方面级情感分析	Twitter15/17	—	—	11 310	是	图文	情感三分类	Twitter
面向图文的方面级情感分析	Multi-ZOL	—	—	5 288	是	图文	情感十分类	ZOL
面向图文的反讽识别	Twitter	—	—	24 653	否	图文	情感二分类	Twitter

任务	数据集	评论数（文本）	图片数	推文数（图文）	是否有情感标注	模态	标签	来源
面向图文情感分析	Yelp	44 305	244 569	—	是	图文	情感五分类	Yelp
	Tumblr	—	—	256 897	是	图文	15种情绪	Tumblr
	MVSA	—	—	2 592	是	图文	情感三分类	Twitter
面向图文的方面级情感分析	Twitter15/17	—	—	11 310	是	图文	情感三分类	Twitter
面向图文的方面级情感分析	Multi-ZOL	—	—	5 288	是	图文	情感十分类	ZOL
面向图文的反讽识别	Twitter	—	—	24 653	否	图文	情感二分类	Twitter

真实值	预测值
真实值	正例	负例
正例	真正例（TP）	假负例（TN）
负例	假正例（FP）	真负例（FN）

Multi-modal Public Opinion Analysis Based on Image and Text Fusion

基于图文融合的多模态舆情分析

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 22

References 89

Related Articles 13

Recommended Articles

Metrics

模型	数据集	数据规模	准确率	F1值	召回率
CBM	微博	5 000	0.800	—	—
Bi-gram	微博	1 620	—	0.759	—
MultiCNN	Flicker	1 872	0.780	—	—
CNN	微博	6 000	0.849	0.848	0.847
FCNN-WBLSTM	微博	5 101	0.868	—	—
Co-Memory Network	MVSA	24 819	0.705	0.700	—
VistaNet	Yelp	44 305	0.619	—	—

方法	算法简单描述	优点	不足
CBW	通过使用词袋模型赋予了文本和图像统一的表示形式,形成消息的特征向量	文本和图像作为一个整体,可以通过统一的方法进行管理	当文本使用了反讽的方式来表达他们的情感,而图像没有明显的情感时,分类效果不佳
Bi-gram	基于K最近邻算法（KNN）和Minkowski距离融合了文本和图像特征,并提出图像可以有助于预测文本的情绪	融合文本和图像特征的基础上,提出了一种新的基于相似度的邻域分类器	—
MultiCNN	采用两个独立的CNN结构学习文本特征和视觉特征,其特征的联合表示作为另一个CNN结构的输入以提取两种表示	可以更好地利用文本和图像之间的内部关系	图像中更详细的语义信息已被忽略
CNN	将CNN编码的图像作为双向LSTM网络的输入,采用多示例学习方法和目标检测方法SSD分别提取图像的全局特征	结合图像局部的高级语义信息	—
FCNN-WBLSTM	经过参数迁移和微调的方法构建图片情感分类模型,通过词嵌入技术以及双向网络构建文字情感分类模型	借鉴了迁移学习的思想,因此避免了人工标注数据集的昂贵代价	FCNN模型对正面微博的识别率较低
Co-Memory Network	关键结构是对图像和文本的双向交互进行建模,最后通过softmax进行情感分类	考虑了视觉信息和文本信息的相互关系	仅考虑一种模态信息对另一种模态（例如,图像到文本或文本到图像）的影响
VistaNet	关键在于将视觉信息建模为注意力,而不是特征;将图像作为文本的附属特征而非独立信息,利用图像作为注意力基准,强调文本中的重点句子	将图像作为注意力纳入基于评论的情感分析	当评论中存在反讽情绪时,会导致模态间的差异性逐渐增大,情感不一致的问题愈加突出

模型	数据集	数据规模	准确率	F值	召回率
DNN	微博	6 171	0.826	0.883	0.872
CNN-EnsCla	Flicker/Twitter	13 413/1 269	0.616	—	—
USAMTV	新浪微博	1 560	0.721	0.691	0.662
DMAF	Twitter	19 694	0.769	0.769	0.760

方法	算法简单描述	优点	不足
DNN	核心为基于CNN的模型学习信息文本和相关图像的更高层次的表示	1. 模型通过对词向量进行无监督的预训练,可以学习到更多的区分性特征 2. 视觉模型通过一个包含数十亿个参数的深而大的神经网络来提取更多的抽象特征,并通过利用正则化的Dropout来有效地缓解过度拟合问题	文本和视觉内容之间的关系经常被忽略
CNN-EnsCla	探索图文情感特征之间相关性,增强微博的情感倾向性预测的准确性	利用了图像特征与文本特征之间具有互补作用的特性	—
USAMTV	通过在ASUM 模型中添加表情符号,同时引入连词情感转移变量来处理句子的情感从属关系	合理利用了语义信息和微博的特性,针对文本情感分析的无监督学习方法,不需要标记数据进行训练学习	随着主题数目的增长,识别精度会有所降低
DMAF	分别关注视觉注意模型和语义注意模型,然后基于多模态注意力机制的中间融合	将中间融合和后期融合整合到多模态情感分析的整体框架中,可以更有效地处理多模态数据的不完全内容	网络不能在每个模态之间建模关系

模型	数据集	数据规模	准确率	F值	召回率
CNN-CCR	Getty Images	101	0.800	0.800	0.759
WS-MDL	Twitter	435 458	0.695	—	—
Bi-MHG	微博	435 000	0.900	0.903	—
CCA-LDA	Flicker	22 843	0.835	—	—
H-LSTM+MLP	Flicker	163 281	0.881	0.880	0.879
MultiSentiNet	MVSA	5 129	0.689	0.681	—

方法	算法简单描述	优点	不足
CNN-CCR	主要思想是对相关但不同的模态特征加以一致性的约束	模型CCR制定既简单又可概括,易于实现且有效;模型可以在小批量模式下基于大规模数据集进行训练	忽略了图像区域和单词之间的结构化映射
WS-MDL	从预训练的CNN和DCNN模型中计算情感概率分布和多模态语句的一致性;训练一个概率图模型来区分噪声标签的贡献权值,这些贡献权值被进一步发送回来,分别更新CNN和DCNN模型的参数来计算多模式预测得分和情绪一致性得分	从廉价的表情标签中训练一个判别模型用于多模态预测	无法调查表情符号标签顺序
Bi-MHG	构建双层多模态超图学习（Bi-MHG）,在统一的双层学习方案中共享多模式功能的相关性	有效地解决对模态之间的依赖性问题	无法预测基于多模式的相关性
CCA-LDA	通过多模态深度多重判别性相关分析进一步生成最大相关的判别性特征表示,最后使用co-attention网络来交互合并这两种特征表示	解决了已有的图像-文本的多媒体情感分析研究中存在的异构模态的特征融合方式相对简单,以及单一图像处理上仅从图像自身提取特征等不足的问题	—
H-LSTM+MLP	用于探索图像、文本及其社会关系之间的横向关系;该模型具有互补性,可以使情感分析更加有效	探索了图像和相应的文本描述之间存在多层次的相关性,以及获取社会图像情感分析的互补性和综合性信息	仅用于特定链接,并且这些链接在社交媒体上不可靠
MultiSentiNet	提出了一个视觉特征引导的注意LSTM模型来提取对理解整个tweet的情感有重要意义的词语,并将这些信息词语的表示与视觉语义特征、对象和场景进行聚合	通过将物体和场景识别为显著的特征来提取图像的深层语义特征,表明其提取的深层语义特征与情感表现出高度相关性	忽略了视觉和文本信息之间的相互加强和互补的特征,并且总体上缺乏处理多模式内容交互的细粒度架构

[1]	BAO Guangbin, LI Gangle, WANG Guoxiong. Bimodal Interactive Attention for Multimodal Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 909-916.
[2]	XIAO Zeguan, CHEN Qingliang. Aspect-Based Sentiment Analysis Model with Multiple Grammatical Information [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(2): 395-402.
[3]	WU Jiawei, SUN Yanchun. Recommendation System for Medical Consultation Integrating Knowledge Graph and Deep Learning Methods [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1432-1440.
[4]	LIU Jiming, ZHANG Peixiang, LIU Ying, ZHANG Weidong, FANG Jie. Summary of Multi-modal Sentiment Analysis Technology [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1165-1182.
[5]	CHEN Hong, YANG Yan, DU Shengdong. Research on Aspect-Level Sentiment Analysis of User Reviews [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 478-485.
[6]	CAO Yu, LI Tianrui, JIA Zhen, YIN Chengfeng. BGRU: New Method of Chinese Text Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(6): 973-981.
[7]	YU Tao, LUO Ke. Sentiment Analysis with Dynamic Multi-Pooling Convolution Neural Network [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1182-1190.
[8]	ZHAO Zhibin, LIU Huan, YAO Lan, YU Ge. Research on Dimension Mining and Sentiment Analysis for Chinese Product Comments [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(3): 341-349.
[9]	PENG Yao, ZU Chen, ZHANG Daoqiang. Hypergraph Based Multi-Modal Feature Selection and Its Application [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(1): 112-119.
[10]	PU Ciren, HOU Jialin, LIU Yue, ZHAI Donghai. Deep Learning Algorithm Applied in Tibetan Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7): 1122-1130.
[11]	LI Tianchen, YIN Jianping. Sentiment Polarity Discrimination Method Based on Topic Clustering [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 989-994.
[12]	LI Haichao, LI Chenglong, TANG Jin, LUO Bin. Research on Fusion Algorithm for Thermal and Visible Images [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(3): 407-413.
[13]	OU Gaoyan, CHEN Wei, WANG Tengjiao, LEI Kai, YANG Dongqing. Sentiment Influence Maximization Model for Microblogging System [J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(9): 769-778.