[1] XU G, YU Z, YAO H, et al. Chinese text sentiment analysis based on extended sentiment dictionary[J]. IEEE Access, 2019, 7: 43749-43762.
[2] ZHOU C, SUN C, LIU Z, et al. A C-LSTM neural network for text classification[J]. arXiv:1511.08630, 2015.
[3] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv:1606.06259, 2016.
[4] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv:1707.07250, 2017.
[5] ZHANG X, CHEN Y, LI G. Multi-modal sarcasm detection based on contrastive attention mechanism[C]//Proceedings of the 2021 CCF International Conference on Natural Language Processing and Chinese Computing. Cham: Springer, 2021: 822-833.
[6] BAHDANAU D, CHO K, BENGIO Y. Neural machine trans-lation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[7] WANG Y, HUANG M, ZHU X, et al. Attention-based LSTM for aspect-level sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2016: 606-615.
[8] LIU G, GUO J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification[J]. Neurocomputing, 2019, 337: 325-338.
[9] HUANG F, ZHANG X, ZHAO Z, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Know-ledge-Based Systems, 2019, 167: 26-37.
[10] TRUONG Q T, LAUW H W. VistaNet: visual aspect attention network for multimodal sentiment analysis[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 305-312.
[11] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks: towards good practices for deep action recognition[C]//Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 20-36.
[12] WANG W, TRAN D, FEISZLI M. What makes training multi-modal classification networks hard?[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 12692-12702.
[13] SUN Y, MAI S, HU H. Learning to balance the learning rates between various modalities via adaptive tracking factor[J]. IEEE Signal Processing Letters, 2021, 28: 1650-1654.
[14] ISMAIL A A, HASAN M, ISHTIAQ F. Improving multimodal accuracy through modality pre-training and attention [J]. arXiv:2011.06102, 2020.
[15] XIONG C, ZHONG V, SOCHER R. Dynamic coattention networks for question answering[J]. arXiv:1611.01604, 2016.
[16] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1122-1131.
[17] PENG Z, LU Y, PAN S, et al. Efficient speech emotion recognition using multi-scale CNN and attention[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 3020-3024.
[18] PORIA S, PENG H, HUSSAIN A, et al. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis[J]. Neurocomputing, 2017, 261: 217-230.
[19] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018: 5634-5641.
[20] LIANG P P, LIU Z, ZADEH A, et al. Multimodal language analysis with recurrent multistage fusion[J]. arXiv:1808.03920, 2018.
[21] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv:1806.00064, 2018.
[22] MAI S, XING S, HU H. Locally confined modality fusion network with a global perspective for multimodal human affective computing[J]. IEEE Transactions on Multimedia, 2019, 22(1): 122-137.
[23] WINTERBOTTOM T, XIAO S, MCLEAN A, et al. On modality bias in the TVQA dataset[J]. arXiv:2012.10210, 2020.
[24] DU C, LI T, LIU Y, et al. Improving multi-modal learning with uni-modal teachers[J]. arXiv:2106.11059, 2021.
[25] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[26] GRAVES A, MOHAMED A R, HINTON G. Speech recognition with deep recurrent neural networks[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2013: 6645-6649.
[27] ZADEH A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 2236-2246.
[28] TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[J]. arXiv:1806.06176, 2018.
[29] TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6558-6569. |