计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (7): 989-994.DOI: 10.3778/j.issn.1673-9418.1507044

• 人工智能与模式识别 • 上一篇    下一篇

基于主题聚类的情感极性判别方法

李天辰1+,殷建平2   

  1. 1. 国防科学技术大学 计算机学院,长沙 410073
    2. 国防科学技术大学 高性能计算重点实验室,长沙 410073
  • 出版日期:2016-07-01 发布日期:2016-07-01

Sentiment Polarity Discrimination Method Based on Topic Clustering

LI Tianchen1+, YIN Jianping2   

  1. 1. College of Computer, National University of Defense Technology, Changsha 410073, China
    2. State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China
  • Online:2016-07-01 Published:2016-07-01

摘要: 目前,大多数方法在判别文本情感极性上采用的是提取情感特征并应用分类器进行分类的方式。然而由于网络文本表述方式多样,主题分散等特点,使得情感特征提取过程变得愈发困难。借助LDA(latent Dirichlet allocation)主题模型,首先对文本进行主题聚类,然后在每个主题子类上应用循环神经网络的方法对正、负情感样本分别建立主题模型,最后基于所属主题和所属情感的概率进行联合判断。采用这种方法,通过划分子类的方式规整了不同主题下文本的表述方式,限制了不同主题下词汇词义改变的问题,并且利用训练语言模型的方法很好地规避了直接提取特征的困难,将特征的挖掘过程内化在了训练模型的过程中。通过在IMDB电影评论样本上的实验可以看出,在应用了主题聚类后,模型分类的准确性有了显著提高。

关键词: 情感分析, 主题模型, 循环神经网络

Abstract: Almost all state-of-art methods for sentiment analysis can hardly avoid extracting sentiment features and applying them to classifiers for detecting. However, with the characteristics of diversity expressions and scattered themes of network texts, it’s too difficult to extract more suitable and proper sentiment features. This paper proposes a novel algorithm to solve such problems. Firstly, original texts need to be clustered by topics with LDA (latent Dirichlet allocation) model. Then, for each topic dataset, language models are trained for positive and negative samples by using recurrent neural network. Finally, two kinds of probabilities of topic and sentiment are combined for evaluating text sentiment polarity. Through this method, this paper firstly standardizes text expression by dividing subcategories, limiting changes of words meaning under different topics, and then utilizes language model to avoid the difficulty of extracting features, making it possible to be internalized in the process of training model. The experimental results on IMDB show that the proposed method improves a lot in terms of accuracy with topic clustering.

Key words: sentiment analysis, topic model, recurrent neural network