计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (5): 1328-1338.DOI: 10.3778/j.issn.1673-9418.2301042

• 人工智能·模式识别 • 上一篇    下一篇

融合动态梯度和多视图协同注意力的情感分析

王香,毛力,陈祺东,孙俊   

  1. 1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
    2. 无锡学院 物联网工程学院,江苏 无锡 210044
  • 出版日期:2024-05-01 发布日期:2024-04-29

Sentiment Analysis Combining Dynamic Gradient and Multi-view Co-attention

WANG Xiang, MAO Li, CHEN Qidong, SUN Jun   

  1. 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. College of Internet of Things Engineering, Wuxi University, Wuxi, Jiangsu 210044, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 针对多模态情感分析中模态间优化不平衡和多模态特征融合不充分的问题,提出一种融合动态梯度机制和多视图协同注意力机制的多模态情感分析模型(DG-MCM),能够有效挖掘单模态特征并充分融合多模态信息。首先,模型使用预训练模型BERT和堆叠式长短期记忆神经网络(SLSTM)学习文本、音频和视频的特征,并提出动态梯度机制,通过监测各模态对学习目标的贡献差异和学习速度辅助各模态的特征学习。其次,将得到的不同模态的特征使用多视图协同注意力机制进行特征融合,通过将每两个模态投影到多个空间执行交互获得更加充分的融合特征。最后,拼接融合特征和单模态特征进行情感预测。在CMU-MOSI和CMU-MOSEI数据集的实验结果表明,该模型能够充分学习单模态和不同模态之间的信息,有效提升多模态情感分析的准确率。

关键词: 情感分析, 多模态, 注意力机制, 特征融合

Abstract: Aiming at the problems of unbalanced inter-modal optimization and inadequate fusion of multimodal features in multimodal sentiment analysis, a multimodal sentiment analysis model combining dynamic gradient mechanism and multi-view co-attention mechanism (DG-MCM) is proposed, which can effectively mine single-modal representation and fully integrate multimodal information. Firstly, the model uses pre-trained model BERT (bidirectional encoder representation from transformers) and stacked long short-term memory (SLSTM) to learn the features of text, audio and video, and proposes a dynamic gradient mechanism. By monitoring the contribution difference and learning speed of each mode, the feature learning of each mode is assisted. Secondly, the features of different modes obtained are fused using the multi-view co-attention mechanism. By projecting every two modes into multiple spaces for interaction, more adequate fusion features are obtained. Finally, fusion features and single-modal features are spliced together for sentiment prediction. Experimental results on CMU-MOSI and CMU-MOSEI datasets show that this model can fully learn information between single mode and different modes, and effectively improve the accuracy of multimodal sentiment analysis.

Key words: sentiment analysis, multimodal, attention mechanism, feature fusion