[1] CHEN M, WANG S, LIANG P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]//Proceedings of the 19th ACM International Conference on Multimodal Interaction, Seattle, Nov 13-17, 2017. New York: ACM, 2017: 163-171.
[2] HAZARIKA D, ZIMMERMANN R, PORIA S. Modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia, Seattle, Oct 12-16, 2020. New York: ACM, 2020: 1131.
[3] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[EB/OL]. [2023-05-13].https://arxiv.org/abs/1707.07250.
[4] ZHANG Y, SONG D, ZHANG P, et al. A quantum-inspired multimodal sentiment analysis framework[J]. Theoretical Computer Science, 2018, 752: 21-40.
[5] ZHANG T. Recent trends in neural networks for multimedia processing[C]//Proceedings of the 6th Seminar on Neural Network Applications in Electrical Engineering, Yugoslavia, Sep 26-28, 2002. Piscataway: IEEE, 2002: 41-45.
[6] GU J, WANG Z, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77:354-377.
[7] SCHMIDHUBER J. Deep learning in neural networks: an overview[J]. Neural Networks, 2015, 61: 85-117.
[8] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[9] PORIA S, CAMBRIA E, HAZARIKA D, et al. Multilevel multiple attentions for contextual multimodal sentiment analysis[C]//Proceedings of the 2017 IEEE International Conference on Data Mining. Piscataway: IEEE, 2017: 1033-1038.
[10] 宋绪靖. 基于文本、语音和视频的多模态情感识别的研究[D]. 济南: 山东大学, 2019: 1-57.
SONG X J. The study of multimodal emotion recognition based on text, speech and video[D]. Jinan: Shandong University, 2019: 1-57.
[11] WU Y, LIN Z, ZHAO Y, et al. A text-centered shared-private framework via cross-modal prediction for multimodal sentiment analysis[C]//Findings of the Association for Computational Linguistics, Aug 1-6, 2021. Stroudsburg: ACL, 2021:4730-4738.
[12] ZHANG K, LI Y, WANG J, et al. Real-time video emotion recognition based on reinforcement learning and domain knowledge[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(3): 1034-1047.
[13] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438.
ZHANG Y Z, RONG L, SONG D W, et al. A survey on multimodal sentiment analysis[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(5): 426-438.
[14] WILLIAMS J, KLEINEGESSE S, COMANESCU R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of the 2018 Grand Challenge Workshop on Human Multimodal Language, Melbourne, Jul 20, 2018. Stroudsburg: ACL, 2018: 11-19.
[15] ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Information Fusion, 2021, 76: 204-226.
[16] LI R, ZHAO J, HU J, et al. Multi-modal fusion for video sentiment analysis[C]//Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, Seattle, Oct 16, 2020. New York: ACM, 2020: 19-25.
[17] 蔡国永, 吕光瑞, 徐智, 等. 基于层次化深度关联融合网络的社交媒体情感分类[J]. 计算机研究与发展, 2019, 56(6): 1312-1324.
CAI G Y, LYU G R, XU Z, et al. A hierarchical deep correlative fusion network for sentiment classification in social media[J]. Journal of Computer Research and Development, 2019, 56(6): 1312-1324.
[18] 林敏鸿, 蒙祖强. 基于注意力神经网络的多模态情感分析[J]. 计算机科学, 2020, 47(11): 508-514.
LIN M H, MENG Z Q. Multimodal sentiment analysis based on attention neural network[J]. Computer Science, 2020, 47(11): 508-514.
[19] 刘启元, 张栋, 吴良庆, 等. 基于上下文增强LSTM的多模态情感分析[J]. 计算机科学, 2019, 46(11): 181-185.
LIU Q Y, ZHANG D, WU L Q, et al. Multi-modal sentiment analysis with context-augmented LSTM[J]. Computer Science, 2019, 46(11): 181-185.
[20] YAN X, XUE H, JIANG S, et al. Multimodal sentiment analysis using multi-tensor fusion network with cross-modal modeling[J]. Applied Artificial Intelligence, 2021, 36(1).
[21] GKOUMAS D, LI Q, LIOMA C, et al. What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J]. Information Fusion, 2021, 66: 184-197.
[22] 刘宇宸, 宗成庆. 跨模态信息融合的端到端语音翻译[J]. 软件学报, 2023, 34(4): 1837-1849.
LIU Y C, ZONG C Q. End-to-end speech translation by integrating cross-modal information[J]. Journal of Software, 2023, 34(4): 1837-1849.
[23] YANG B, SHAO B, WU L, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neuro-computing, 2022, 467: 130-137.
[24] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6558-6569.
[25] WANG Z, WAN Z, WAN X. TransModality: an end2end fusion method with transformer for multimodal sentiment analysis[C]//Proceedings of the Web Conference 2020, Taipei, China, Apr 20-24, 2020. New York: ACM, 2020: 2514-2520.
[26] WANG F, TIAN S, YU L, et al. Transformer-based encoding-decoding translation network for multimodal sentiment analysis[J]. Cognitive Computation, 2022, 151: 289-303.
[27] HUDDAR M, SANNAKKI S, RAJPUROHIT V. Multi-level context extraction and attention-based contextual inter-modal fusion multimodal sentiment analysis and emotion classification[J]. International Journal of Multimedia Information Retrieval, 2020, 9(2): 103-112.
[28] XI C, LU G, YAN J. Multimodal sentiment analysis based on multi-head attention mechanism[C]//Proceedings of the 4th International Conference on Machine Learning and Soft Computing, Haiphong City, Jan 17-19, 2020. New York: ACM, 2020: 34-39.
[29] OLSON D. From utterance to text: the bias of language in speech and writing[J]. Harvard Educational Review, 1977, 47(3): 257-281.
[30] ZHI Y, TONG Z, WANG L, et al. MGSampler: an explainable sampling strategy for video action recognition[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 1513-1522.
[31] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.
[32] ZADEH A, LIANG P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 2236-2246.
[33] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion[EB/OL]. [2023-05-13]. https://arxiv.org/abs/1606.06259.
[34] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 5634-5641.
[35] TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 5649-5657.
[36] HAZARIKA D, ZIMMERMANN R, MISA S P. Modality-invariant and-specific representations for multimodal sentiment analysis[EB/OL]. [2023-05-13]. https://doi.org/10.1145/3394171.3413678.
[37] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2021: 10790-10797.
[38] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2021: 5421-5432.
[39] 吕学强, 田驰, 张乐, 等. 融合多特征和注意力机制的多模态情感分析模型[J]. 数据分析与知识发现, 2024, 8(5): 91-101.
LYU X Q, TIAN C, ZHANG L, et al. Multimodal sentiment analysis model integrating multi-features and attention mechanism[J]. Data Analysis and Knowledge Discovery, 2024, 8(5): 91-101.
[40] SUN H, WANG H, LIU J, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Oct 10-14, 2022. New York: ACM, 2022: 3722-3729. |