计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (11): 1033-1039.DOI: 10.3778/j.issn.1673-9418.1305008

• 学术研究 • 上一篇    下一篇

中文文本情感词典构建方法

阳爱民1+,林江豪2,周咏梅1   

  1. 1. 广东外语外贸大学 思科信息学院,广州 510420
    2. 广东外语外贸大学 国际工商管理学院,广州 510420
  • 出版日期:2013-11-01 发布日期:2013-11-04

Method on Building Chinese Text Sentiment Lexicon

YANG Aimin1+, LIN Jianghao2, ZHOU Yongmei1   

  1. 1. Cisco School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510420, China
    2. School of Management, Guangdong University of Foreign Studies, Guangzhou 510420, China
  • Online:2013-11-01 Published:2013-11-04

摘要: 互联网海量文本的情感分析是当前的一个研究热点。介绍了一种中文文本情感词典构建方法,该方法选用若干个情感种子词,利用搜索引擎返回的共现数,通过改进的PMI(pointwise mutual information)算法计算情感词的情感权值。将构建的情感词典应用到文本情感分类实验中,在不同的语料环境下,对比基于情感词典和朴素贝叶斯分类器下的文本情感分类效果,实验结果表明,构建的情感词典,可有效用于情感特征选择和直接用于情感分类,并且分类性能稳定。

关键词: 情感词典, 情感分类, PMI算法, 朴素贝叶斯

Abstract: Massive Internet text sentiment analysis is currently a hot research topic. This paper describes a method on Chinese text sentiment lexicon construction. This method improves the pointwise mutual information (PMI) algorithm for computing the weights of general sentiment lexicon, by selecting several sentiment seed words and drawing upon the total result numbers from search engine. In order to examine the validity of this method, this paper uses the established sentiment lexicon for text sentiment, and compares the classification effects of the method based on sentiment lexicon with those of na?ve Bayesian classifier. The experimental results indicate that the high-quality sentiment lexicon can effectively choose and classify the sentiment characteristics, and has a stable classification function.

Key words: sentiment lexicon, sentiment classification, pointwise mutual information (PMI), naïve Bayes