计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (12): 3020-3028.DOI: 10.3778/j.issn.1673-9418.2305077

• 人工智能·模式识别 • 上一篇    下一篇

不确定域特征表示的鲁棒性情感分析模型

陈洁,李帅,赵姝,张燕平   

  1. 安徽大学 计算机科学与技术学院,合肥 230601
  • 出版日期:2023-12-01 发布日期:2023-12-01

Robust Sentiment Analysis Model Based on Feature Representation in Uncertainty Domain

CHEN Jie, LI Shuai, ZHAO Shu, ZHANG Yanping   

  1. School of Computer Science and Technology, Anhui University, Hefei 230601, China

  • Online:2023-12-01 Published:2023-12-01

摘要: 文本数据在情感分类时往往会出现一些较难分类的模糊数据,这些模糊数据因其不确定性在模型训练时易出现过拟合现象,影响模型的鲁棒性。三支决策理论将初始样本划分为确定域和不确定域,模糊数据所在的不确定域如何选取合适特征表示以便下游任务,是目前三支决策情感分析模型面临的挑战。针对此挑战,提出一个基于三支决策不确定域特征表示的鲁棒性情感分析模型(UFR-SA)。首先,基于三支决策理论划分确定域和不确定域,针对不确定域中的模糊样本,定义异类样本点对,构造多粒度特征表示。其次,设计多特征融合模型,将多粒度特征表示送入多层感知网络,以融合各粒度特征优势。最后,对于确定域和不确定域的测试样本采用分而治之的策略,确定域数据用原始特征表示,不确定域中的模糊数据用融合后的鲁棒性特征表示。在SST-2、SST-5以及CR数据集上的实验结果表明,UFR-SA有效降低了模糊数据对模型的干扰,优于目前最好的模型性能。

关键词: 情感分析, 三支决策, 鲁棒性, 多粒度特征表示, 特征融合

Abstract: In the sentiment classification of text data, there are often some fuzzy data that are difficult to classify. Due to their uncertainty, these fuzzy data appear to be over fitted during model training, which affects the robustness of the model. The three-way decision theories divide the initial sample into deterministic domains and uncertain domains, and how to select appropriate features for representation in the uncertain domain where the fuzzy data is located for downstream tasks is the challenge of the three-way decision sentiment analysis models. To address this challenge, a robust sentiment analysis model (UFR-SA) based on feature representation of three-way decision uncertainty domains is proposed. Firstly, based on the three-way decision theory, the deterministic domain and the uncertain domain are divided. For fuzzy samples in the uncertain domain, heterogeneous sample point pairs are defined to construct hierarchical features. Secondly, a hierarchical feature fusion model is designed to incorporate the advantages of each granularity feature into a multi-layer perceptual network. Finally, a divide and conquer strategy is adopted for test samples in the deterministic domain and the uncertain domain. The deterministic domain data are represented by the original features, and the fuzzy data in the uncertain domain are represented by the fused robust features.  Experimental results on SST-2, SST-5, and CR datasets show that UFR-SA effectively reduces the interference of fuzzy data on the model and outperforms the performance of state-of-the-art models.

Key words: sentiment analysis, three-way decision, robustness, multi-granularity feature representation, feature fusion