Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (11): 3041-3050.DOI: 10.3778/j.issn.1673-9418.2309071
• Artificial Intelligence·Pattern Recognition • Previous Articles Next Articles
SUN Jie, CHE Wengang, GAO Shengxiang
Online:
2024-11-01
Published:
2024-10-31
孙杰,车文刚,高盛祥
SUN Jie, CHE Wengang, GAO Shengxiang. Multi-channel Temporal Convolution Fusion for Multimodal Sentiment Analysis[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(11): 3041-3050.
孙杰, 车文刚, 高盛祥. 面向多模态情感分析的多通道时序卷积融合[J]. 计算机科学与探索, 2024, 18(11): 3041-3050.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2309071
[1] ZHU L, ZHU Z, ZHANG C, et al. Multimodal sentiment analysis based on fusion methods: a survey[J]. Information Fusion, 2023, 95: 306-325. [2] GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion, 2023, 91: 424-444. [3] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182. LIU J M, ZHANG P X, LIU Y, et al. Summary of multi-modal sentiment analysis technology[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1165-1182. [4] 之江实验室. 情感计算白皮书[EB/OL]. (2022-12-09)[2023-09-08]. https://www.zhejianglab.com/uploadfile/20221208/1670-465654902617.pdf. Zhejiang Lab. Affective computing[EB/OL]. (2022-12-09)[2023-09-08]. https://www.zhejianglab.com/uplo-adfile/20221208/ 1670465654902617.pdf. [5] LIANG P P, ZADEH A, MORENCY L P. Foundations and trends in multimodal machine learning: principles, challenges, and open questions[J]. ACM Computing Surveys, 2024, 56(10): 264. [6] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Stroudsburg: ACL, 2017: 1103-1114. [7] HAZARIKA D, ZIMMERMANN R, PORIA S, et al. MISA: modality-invariant and -specific representations for multimodal sentiment analysis[EB/OL]. [2023-09-08]. https://arxiv.org/abs/2005.03545. [8] SUN H, WANG H, LIU J, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Oct 10-14, 2022. New York: ACM, 2022: 3722-3729. [9] CHEN M, WANG S, LIANG P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]//Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, Nov 13-17, 2017. New York: ACM, 2017: 163-171. [10] WANG Y, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 7216-7223. [11] ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches: a survey[J]. Information Fusion, 2021, 76: 204-226. [12] RAHMAN W, HASAN M K, LEE S, et al. Integrating multi-modal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2359-2369. [13] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 5642-5649. [14] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 5634-5641. [15] LIANG P P, LIU Z, BAGHER ZADEH A, et al. Multimodal language analysis with recurrent multistage fusion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 150-161. [16] TSAI Y H, BAI S J, LINAG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6558-6569. [17] HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal, Oct 18-22, 2021. New York: ACM, 2021: 6-15. [18] GUO J, TANG J, DAI W, et al. Dynamically adjust word representations using unaligned multimodal information[C]//Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Oct 10-14, 2022. New York: ACM, 2022: 3394-3402. [19] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [20] MAI S, HU H, XING S. Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 481-492. [21] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. [2023-09-08]. https://arxiv.org/abs/1606.06259. [22] ZADEH A, LIANG P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 2236-2246. [23] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the 2021 AAAI Conference on Artificial Intelligence, Feb 2-9, 2021. Menlo Park: AAAI, 2021: 10790-10797. [24] YU W, XU H, MENG F, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 6-8, 2020. Stroudsburg: ACL, 2020: 3718-3727. [25] LIU Y, YUAN Z, MAO H, et al. Make acoustic and visual cues matter: CH-SIMS v2.0 dataset and AV-Mixup consistent module[C]//Proceedings of the 2022 International Conference on Multimodal Interaction, Bengaluru, Nov 7-11, 2022. New York: ACM, 2022: 247-258. [26] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[EB/OL]. [2023-09-08]. https://arxiv.org/abs/1806.00064. [27] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multi-modal sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 7-11, 2021. Stroudsburg: ACL, 2021: 9180-9192. [28] ZOU W, DING J, WANG C. Utilizing BERT intermediate layers for multimodal sentiment analysis[C]//Proceedings of the 2022 IEEE International Conference on Multimedia and Expo, Taipei, China, Jul 18-22, 2022. Piscataway: IEEE, 2022: 1-6. [29] WILLIAMS J, KLEINEGESSE S, COMANESCU R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of the 2018 Grand Challenge and Workshop on Human Multimodal Language, Melbourne, Jul 20, 2018. Stroudsburg: ACL, 2018: 11-19. [30] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-09-08]. https://arxiv.org/abs/1810.04805. [31] DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP—a collaborative voice analysis repository for speech technologies[C]//Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, May 4-9, 2014. Piscataway: IEEE, 2014: 960-964. [32] EKMAN P, ROSENBERG E L. What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS)[M]. New York: Oxford University Press, 2005. [33] LIN H, ZHANG P, LING J, et al. PS-Mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing & Management, 2023, 60(2): 103229. [34] KIM K, PARK S. AOBERT: all-modalities-in-one BERT for multimodal sentiment analysis[J]. Information Fusion, 2023, 92: 37-45. |
[1] | LI Mengyun, ZHANG Jing, ZHANG Huanxiang, ZHANG Xiaolin, LIU Luyao. Multimodal Sentiment Analysis Based on Cross-Modal Semantic Information Enhancement [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2476-2486. |
[2] | ZHU Weiwei, ZHANG Yijia, LIU Guantong, LU Mingyu, LIN Hongfei. Psychological Analysis of College Students?? Anxiety Based on Domain Comparison Adaptive Model [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1900-1910. |
[3] | YANG Li, ZHONG Junhong, ZHANG Yun, SONG Xinyu. Temporal Multimodal Sentiment Analysis with Composite Cross Modal Interaction Network [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1318-1327. |
[4] | WANG Xiang, MAO Li, CHEN Qidong, SUN Jun. Sentiment Analysis Combining Dynamic Gradient and Multi-view Co-attention [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1328-1338. |
[5] | ZHOU Yan, LI Wenjun, DANG Zhaolong, ZENG Fanzhi, YE Dewang. Survey of 3D Model Recognition Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 916-929. |
[6] | HAN Kun, PAN Hongpeng, LIU Zhongyi. Research on Sentiment Analysis of Short Video Network Public Opinion by Integrating BERT Multi-level Features [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(4): 1010-1020. |
[7] | WU Peichen, YUAN Lining, GUO Fang, LIU Zhao. Video Anomaly Detection Methods: a Survey [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(12): 3100-3125. |
[8] | LIU Jun, LENG Fangling, WU Wangwang, BAO Yubin. Construction Method of Textbook Knowledge Graph Based on Multimodal and Knowledge Distillation [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(11): 2901-2911. |
[9] | ZHANG Hucheng, LI Leixiao, LIU Dongjiang. Survey of Multimodal Data Fusion Research [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(10): 2501-2520. |
[10] | LI Jin, XIA Hongbin, LIU Yuan. Dual Features Local-Global Attention Model with BERT for Aspect Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 205-216. |
[11] | ZHANG Wenxuan, YIN Yanjun, ZHI Min. Affection Enhanced Dual Graph Convolution Network for Aspect Based Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 217-230. |
[12] | JIANG Hongxun, ZHANG Lin, SUN Caihong. Knowledge Graph-Based Video Classification Algorithm for Film and Television Drama [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 161-174. |
[13] | CAO Yingli, DENG Zhaohong, HU Shudong, WANG Shitong. Classification of Alzheimer's Disease Integrating Individual Feature and Fusion Feature [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(7): 1658-1668. |
[14] | XIA Hongbin, LI Qiang, LIU Yuan. Local and Global Feature Fusion Network Model for Aspect-Based Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(4): 902-911. |
[15] | HAN Hu, HAO Jun, ZHANG Qiankun, MENG Tiantian. Knowledge-Enhanced Interactive Attention Model for Aspect-Based Sentiment Analysis [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 709-718. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/