计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 909-916.DOI: 10.3778/j.issn.1673-9418.2105071

• 人工智能 • 上一篇    下一篇

面向多模态情感分析的双模态交互注意力

包广斌, 李港乐+(), 王国雄   

  1. 兰州理工大学 计算机与通信学院,兰州 730050
  • 收稿日期:2021-05-19 修回日期:2021-08-03 出版日期:2022-04-01 发布日期:2021-08-05
  • 通讯作者: + E-mail: 1450316716@qq.com
  • 作者简介:包广斌(1975—),男,甘肃兰州人,博士,副教授,主要研究方向为大数据分析、自然语言处理。
    李港乐(1997—),女,山东济宁人,硕士,主要研究方向为自然语言处理。
    王国雄(1997—),男,甘肃陇南人,硕士,主要研究方向为自然语言处理。
  • 基金资助:
    国家自然科学基金(51668043);甘肃省自然科学基金(18JR3RA156)

Bimodal Interactive Attention for Multimodal Sentiment Analysis

BAO Guangbin, LI Gangle+(), WANG Guoxiong   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Received:2021-05-19 Revised:2021-08-03 Online:2022-04-01 Published:2021-08-05
  • About author:BAO Guangbin, born in 1975, Ph.D., associate professor. His research interests include big data analysis and natural language processing.
    LI Gangle, born in 1997, M.S. Her research interest is natural language processing.
    WANG Guoxiong, born in 1997, M.S. His research interest is natural language processing.
  • Supported by:
    National Natural Science Foundation of China(51668043);Natural Science Foundation of Gansu Province(18JR3RA156)

摘要:

针对现有多模态情感分析方法中存在情感分类准确率不高,难以有效融合多模态特征等问题,通过研究分析相邻话语之间的依赖关系和文本、语音和视频模态之间的交互作用,建立一种融合上下文和双模态交互注意力的多模态情感分析模型。该模型首先采用双向门控循环单元(BiGRU)捕获各模态中话语之间的相互依赖关系,得到各模态的上下文信息。为了学习不同模态之间的交互信息,提出了一种双模态交互注意力机制来融合两种模态的信息,并将其作为条件向量来区分各模态信息对于情感分类的重要程度;然后结合自注意力、全连接层组成多模态特征融合模块,挖掘模态内部和模态之间的关联性,获得跨模态联合特征。最后,将得到的上下文特征和跨模态联合特征进行拼接,经过一层全连接层后馈送至Softmax进行最终的情感分类。在公开的多模态情感分析数据集CMU-MOSI上对所提出的模型进行评估,实验结果表明,相比现有模型,该模型在多模态情感分类任务上的表现是有效的和先进的。

关键词: 多模态, 情感分析, 双向门控循环单元(BiGRU), 上下文, 双模态交互注意力, 特征融合

Abstract:

Aiming at the problems of low accuracy of sentiment classification and difficulty in effectively fusing multimodal features in existing multimodal sentiment analysis methods, a multimodal sentiment analysis model combining context and bimodal interactive attention is established by analyzing the dependence between adjacent utterances and the interaction among text, audio and video modalities. Firstly, the model adopts a bidirectional gated recurrent unit (BiGRU) to capture the dependence between utterances in each modal, and the context information of each modal is obtained. In order to learn the interactive information between different modalities, a bimodal interactive attention mechanism is proposed to fuse the information of the two modalities and it is used as a condition vector to distinguish the importance of each modal for sentiment classification. Then, self-attention and fully connected layers are combined to form a multimodal feature fusion module, the correlation of information within and between modalities is mined, and cross-modal joint features are obtained. Finally, the obtained contextual features and cross-modal joint features are spliced, and then fed to Softmax for the final sentiment classification after a fully connected layer. The proposed model is evaluated on the public multimodal sentiment analysis dataset CMU-MOSI (CMU multimodal opinion-level sentiment intensity). The experimental results show that compared with the existing models, the performance of the proposed model on multimodal sentiment classification task is effective and advanced.

Key words: multimodal, sentiment analysis, bidirectional gated recurrent unit (BiGRU), context, bimodal interactive attention, feature fusion

中图分类号: