[1] ZHU L, ZHU Z, ZHANG C, et al. Multimodal sentiment analysis based on fusion methods: a survey[J]. Information Fusion, 2023, 95: 306-325.
[2] GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion, 2023, 91: 424-444.
[3] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.
LIU J M, ZHANG P X, LIU Y, et al. Summary of multi-modal sentiment analysis technology[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1165-1182.
[4] 之江实验室. 情感计算白皮书[EB/OL]. (2022-12-09)[2023-09-08]. https://www.zhejianglab.com/uploadfile/20221208/1670-465654902617.pdf.
Zhejiang Lab. Affective computing[EB/OL]. (2022-12-09)[2023-09-08]. https://www.zhejianglab.com/uplo-adfile/20221208/ 1670465654902617.pdf.
[5] LIANG P P, ZADEH A, MORENCY L P. Foundations and trends in multimodal machine learning: principles, challenges, and open questions[J]. ACM Computing Surveys, 2024, 56(10): 264.
[6] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Stroudsburg: ACL, 2017: 1103-1114.
[7] HAZARIKA D, ZIMMERMANN R, PORIA S, et al. MISA: modality-invariant and -specific representations for multimodal sentiment analysis[EB/OL]. [2023-09-08]. https://arxiv.org/abs/2005.03545.
[8] SUN H, WANG H, LIU J, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Oct 10-14, 2022. New York: ACM, 2022: 3722-3729.
[9] CHEN M, WANG S, LIANG P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]//Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, Nov 13-17, 2017. New York: ACM, 2017: 163-171.
[10] WANG Y, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 7216-7223.
[11] ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches: a survey[J]. Information Fusion, 2021, 76: 204-226.
[12] RAHMAN W, HASAN M K, LEE S, et al. Integrating multi-modal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2359-2369.
[13] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 5642-5649.
[14] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 5634-5641.
[15] LIANG P P, LIU Z, BAGHER ZADEH A, et al. Multimodal language analysis with recurrent multistage fusion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 150-161.
[16] TSAI Y H, BAI S J, LINAG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6558-6569.
[17] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal, Oct 18-22, 2021. New York: ACM, 2021: 6-15.
[18] GUO J, TANG J, DAI W, et al. Dynamically adjust word representations using unaligned multimodal information[C]//Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Oct 10-14, 2022. New York: ACM, 2022: 3394-3402.
[19] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[20] MAI S, HU H, XING S. Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 481-492.
[21] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. [2023-09-08]. https://arxiv.org/abs/1606.06259.
[22] ZADEH A, LIANG P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 2236-2246.
[23] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the 2021 AAAI Conference on Artificial Intelligence, Feb 2-9, 2021. Menlo Park: AAAI, 2021: 10790-10797.
[24] YU W, XU H, MENG F, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 6-8, 2020. Stroudsburg: ACL, 2020: 3718-3727.
[25] LIU Y, YUAN Z, MAO H, et al. Make acoustic and visual cues matter: CH-SIMS v2.0 dataset and AV-Mixup consistent module[C]//Proceedings of the 2022 International Conference on Multimodal Interaction, Bengaluru, Nov 7-11, 2022. New York: ACM, 2022: 247-258.
[26] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[EB/OL]. [2023-09-08]. https://arxiv.org/abs/1806.00064.
[27] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multi-modal sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 7-11, 2021. Stroudsburg: ACL, 2021: 9180-9192.
[28] ZOU W, DING J, WANG C. Utilizing BERT intermediate layers for multimodal sentiment analysis[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo, Taipei, China, Jul 18-22, 2022. Piscataway: IEEE, 2022: 1-6.
[29] WILLIAMS J, KLEINEGESSE S, COMANESCU R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of the 2018 Grand Challenge and Workshop on Human Multimodal Language, Melbourne, Jul 20, 2018. Stroudsburg: ACL, 2018: 11-19.
[30] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-09-08]. https://arxiv.org/abs/1810.04805.
[31] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP—a collaborative voice analysis repository for speech technologies[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 960-964.
[32] EKMAN P, ROSENBERG E L. What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS)[M]. New York: Oxford University Press, 2005.
[33] LIN H, ZHANG P, LING J, et al. PS-Mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing & Management, 2023, 60(2): 103229.
[34] KIM K, PARK S. AOBERT: all-modalities-in-one BERT for multimodal sentiment analysis[J]. Information Fusion, 2023, 92: 37-45. |