[1] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438.
ZHANG Y Z, RONG L, SONG D W, et al. A survey on multimodal sentiment analysis[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(5): 426-438.
[2] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multi-modal corpus of sentiment intensity and subjectivity analy-sis in online opinion videos[J]. arXiv:1606.06259, 2016.
[3] PORIA S, CAMBRIA E, HAZARIKA D, et al. Multi-level multiple attentions for contextual multimodal sentiment analysis[C]//Proceedings of the 2017 IEEE International Conference on Data Mining, New Orleans, Nov 18-21, 2017. Piscataway: IEEE, 2017: 1033-1038.
[4] 刘继明, 张培翔, 刘颖, 等. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.
LIU J M, ZHANG P X, LIU Y, et al. Summary of multi-modal sentiment analysis technology[J]. Journal of Fron-tiers of Computer Science and Technology, 2021, 15(7): 1165-1182.
[5] NOJAVANASGHARI B, GOPINATH D, KOUSHIK J, et al. Deep multimodal fusion for persuasiveness prediction[C]//Proceedings of the 18th ACM International Confe-rence on Multimodal Interaction, Tokyo, Nov 12-16, 2016. New York: ACM, 2016: 284-288.
[6] WOLLMER M, WENINGER F, KNAUP T, et al. YouTube movie reviews: sentiment analysis in an audio-visual con-text[J]. IEEE Intelligent Systems, 2013, 28(3): 46-53.
[7] KIM Y. Convolutional neural networks for sentence classi-fication[C]//Proceedings of the 2014 Conference on Empiri-cal Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1746-1751.
[8] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015. Stroudsburg: ACL, 2015: 1422-1432.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term me-mory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] CAMBRIA E, HAZARIKA D, PORIA S, et al. Bench-marking multimodal sentiment analysis[C]//LNCS 10762: Proceedings of the 18th International Conference on Com-putational Linguistics and Intelligent Text Processing, Bu-dapest, Apr 17-23, 2017. Cham: Springer, 2017: 166-179.
[11] WILLIAMS J, KLEINEGESSE S, COMANESCU R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of the 2018 Grand Challenge and Workshop on Human Multimodal Language, Melbou-rne, Jul 20, 2018. Stroudsburg: ACL, 2018: 11-19.
[12] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Pro-ceedings of the 32nd AAAI Conference on Artificial Intelli-gence, the 30th Innovative Applications of Artificial Intelli-gence, and the 8th AAAI Symposium on Educational Ad-vances in Artificial Intelligence, New Orleans, Feb 2-7, 2018. Menlo Park: AAAI, 2018: 5634-5641.
[13] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Sep 9-11, 2017. Strouds-burg: ACL, 2017: 1103-1114.
[14] LIU Z, YING S, BHARADHWAJ V A, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv:1806.00064, 2018.
[15] TSAI Y H,BAI S J, LINAG P P, et al. Multimodal trans-former for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6558-6569.
[16] SHENOY A, SARDANA A. Multilogue-Net: a context aware RNN for multi-modal emotion detection and sentiment analysis in conversation[J]. arXiv:2002.08267, 2020.
[17] HAZARIKA D, ZIMMERMANN R, PORIA S, et al. MISA: modality-invariant and -specific representations for multi-modal sentiment analysis[J]. arXiv:2005.03545, 2020.
[18] MAJUMDER N, HAZARIKA, GELBUKH E, et al. Multi-modal sentiment analysis using hierarchical fusion with context modeling[J]. Knowledge-Based Systems, 2018, 161: 124-133.
[19] BAHDANUA D, CHO K, BENGIO Y. Neural machine trans-lation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[20] ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages[J]. IEEE Intelligent Systems, 2016, 31(6): 82-88.
[21] ZADEH A, LIANG P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 2236-2246.
[22] DEVLIN J, MING W C, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understan-ding[J]. arXiv:1810.04805, 2018.
[23] MCFEE B, RAFFEL C, LIANG D, et al. LibROSA: audio and music signal analysis in Python[C]//Proceedings of the 14th Python in Science Conference, Austin, Jul 6-12, 2015: 18-25.
[24] ZHANG W L, LI R J, TAO Z, et al. Deep model based transfer and multi-task learning for biological image analy-sis[J]. IEEE Transactions on Big Data, 2016, 6(2): 322-333.
[25] BALTRUSAITIS T, ZADEH A, LIM Y C, et al. OpenFace 2.0: facial behavior analysis toolkit[C]//Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition, Xi??an, May 15-19, 2018. Washington: IEEE Computer Society, 2018: 59-66.
[26] KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[27] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
|