基于主题聚类的情感极性判别方法

doi:10.3778/j.issn.1673-9418.1507044

计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (7): 989-994.DOI: 10.3778/j.issn.1673-9418.1507044

基于主题聚类的情感极性判别方法

李天辰1+，殷建平2

1. 国防科学技术大学计算机学院，长沙 410073
2. 国防科学技术大学高性能计算重点实验室，长沙 410073

出版日期:2016-07-01 发布日期:2016-07-01

Sentiment Polarity Discrimination Method Based on Topic Clustering

LI Tianchen1+, YIN Jianping2

1. College of Computer, National University of Defense Technology, Changsha 410073, China
2. State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China

Online:2016-07-01 Published:2016-07-01

摘要/Abstract

摘要： 目前，大多数方法在判别文本情感极性上采用的是提取情感特征并应用分类器进行分类的方式。然而由于网络文本表述方式多样，主题分散等特点，使得情感特征提取过程变得愈发困难。借助LDA（latent Dirichlet allocation）主题模型，首先对文本进行主题聚类，然后在每个主题子类上应用循环神经网络的方法对正、负情感样本分别建立主题模型，最后基于所属主题和所属情感的概率进行联合判断。采用这种方法，通过划分子类的方式规整了不同主题下文本的表述方式，限制了不同主题下词汇词义改变的问题，并且利用训练语言模型的方法很好地规避了直接提取特征的困难，将特征的挖掘过程内化在了训练模型的过程中。通过在IMDB电影评论样本上的实验可以看出，在应用了主题聚类后，模型分类的准确性有了显著提高。

关键词: 情感分析, 主题模型, 循环神经网络

Abstract: Almost all state-of-art methods for sentiment analysis can hardly avoid extracting sentiment features and applying them to classifiers for detecting. However, with the characteristics of diversity expressions and scattered themes of network texts, it’s too difficult to extract more suitable and proper sentiment features. This paper proposes a novel algorithm to solve such problems. Firstly, original texts need to be clustered by topics with LDA (latent Dirichlet allocation) model. Then, for each topic dataset, language models are trained for positive and negative samples by using recurrent neural network. Finally, two kinds of probabilities of topic and sentiment are combined for evaluating text sentiment polarity. Through this method, this paper firstly standardizes text expression by dividing subcategories, limiting changes of words meaning under different topics, and then utilizes language model to avoid the difficulty of extracting features, making it possible to be internalized in the process of training model. The experimental results on IMDB show that the proposed method improves a lot in terms of accuracy with topic clustering.

Key words: sentiment analysis, topic model, recurrent neural network

李天辰，殷建平. 基于主题聚类的情感极性判别方法[J]. 计算机科学与探索, 2016, 10(7): 989-994.

LI Tianchen, YIN Jianping. Sentiment Polarity Discrimination Method Based on Topic Clustering[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 989-994.

[1]	武家伟, 孙艳春. 融合知识图谱和深度学习方法的问诊推荐系统[J]. 计算机科学与探索, 2021, 15(8): 1432-1440.
[2]	刘继明, 张培翔, 刘颖, 张伟东, 房杰. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.
[3]	能文鹏, 陆军, 赵彩虹. 基于关系归纳偏置的睡眠分期综述[J]. 计算机科学与探索, 2021, 15(6): 1026-1037.
[4]	王晓东, 赵一宁, 肖海力, 王小宁, 迟学斌. 使用GNN与RNN实现用户行为分析[J]. 计算机科学与探索, 2021, 15(5): 838-847.
[5]	陈虹, 杨燕, 杜圣东. 用户评论方面级情感分析研究[J]. 计算机科学与探索, 2021, 15(3): 478-485.
[6]	王乐为，余鹰，张应龙. 基于Seq2Seq模型的自定义古诗生成[J]. 计算机科学与探索, 2020, 14(6): 1028-1035.
[7]	王世杰，周丽华，孔兵，周俊华. 基于LDA-DeepHawkes模型的信息级联预测[J]. 计算机科学与探索, 2020, 14(3): 410-425.
[8]	刘少钦，唐爽，赵俊峰，王亚沙，卓琳. 基于扩展主题模型的异常医疗处方检测方法[J]. 计算机科学与探索, 2020, 14(1): 30-39.
[9]	黄畅，郭文忠，郭昆. 面向微博热点话题发现的改进BBTM模型研究[J]. 计算机科学与探索, 2019, 13(7): 1102-1113.
[10]	张国豪，刘波. 采用CNN和Bidirectional GRU的时间序列分类研究[J]. 计算机科学与探索, 2019, 13(6): 916-927.
[11]	曹宇，李天瑞，贾真，殷成凤. BGRU:中文文本情感分析的新方法[J]. 计算机科学与探索, 2019, 13(6): 973-981.
[12]	巩轶凡，刘红岩，何军，岳永姣，杜小勇. 带有覆盖率机制的文本摘要模型研究[J]. 计算机科学与探索, 2019, 13(2): 205-213.
[13]	周凯文，杨智慧，马会心，何震瀛，荆一楠，王晓阳. 面向特定划分的主题模型的设计与实现[J]. 计算机科学与探索, 2018, 12(7): 1036-1046.
[14]	喻涛，罗可. 利用动态多池卷积神经网络的情感分析模型[J]. 计算机科学与探索, 2018, 12(7): 1182-1190.
[15]	赵志滨，刘欢，姚兰，于戈. 中文产品评论的维度挖掘及情感分析技术研究[J]. 计算机科学与探索, 2018, 12(3): 341-349.

基于主题聚类的情感极性判别方法

Sentiment Polarity Discrimination Method Based on Topic Clustering

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics