Journal of Frontiers of Computer Science and Technology ›› 2016, Vol. 10 ›› Issue (9): 1320-1331.DOI: 10.3778/j.issn.1673-9418.1507034

Previous Articles     Next Articles

Text Sentiment Classification Based on Cloud Model Clustering and Mixed-Fisher Feature

XING Yujuan1+, GUO Xian2, TAN Ping1, LI Ming2   

  1. 1. School of Digital Media, Lanzhou University of Arts and Science, Lanzhou 730000, China
    2. School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Online:2016-09-01 Published:2016-09-05

Mixed-Fisher特征云模型聚类在文本情感分类中的应用

邢玉娟1+,郭  显2,谭  萍1,李  明2   

  1. 1. 兰州文理学院 数字媒体学院,兰州 730000
    2. 兰州理工大学 计算机与通信学院,兰州 730050

Abstract: The appearance of massive Web information turns the view extraction from documents into research hotspots. Aiming at the ambiguity in natural language and lower classification precision in text sentiment classification, this paper proposes a novel text sentiment classification algorithm based on Mixed-Fisher feature selection and cloud vector model clustering. In this algorithm, the Fisher's discriminant ratio of different part-of-speech features is computed firstly. The q larger values of Fisher's discriminant ratio features are selected as the candidate features to form Mixed-Fisher feature vector according to the Fisher criterion. These features are combined according to the parts of speech to generate the Mixed-Fisher feature set. And then, cloud vector model is generated based on this Mixed-Fisher feature set for each document. Immediately following, documents are clustered according to their similarity between cloud vector models. Finally, kernel Fisher discriminant (KFD) is adopted as the classifier to judge views. The experimental results demonstrate that the classification precision of the proposed algorithm outperforms traditional vector space model, and the effectiveness of KFD is verified.

Key words: text sentiment classification, Fisher discriminant ratio, part-of-speech feature, cloud vector model, kernel Fisher discriminant

摘要: 海量网络信息的出现,使得提取文本信息情感观点成为研究的热点。针对文本情感分类中文本信息模糊及分类准确率低的问题,提出了一种基于Mixed-Fisher特征选择的文本云向量模型聚类算法。该算法首先分别计算文档中各个词性特征项的Fisher判别比,根据Fisher判别比越大特征向量判别性越强的Fisher准则,选择Fisher比值较大的前q个特征,并按照词性进行组合生成文档的Mixed-Fisher特征向量。然后在Mixed-Fisher特征向量集上构建文档的云向量模型,根据云向量模型间的差异度对模型进行聚类和合并。将该算法应用于文本情感观点的分类,选择核Fisher判别技术用于最终文本观点的判定。仿真实验结果表明,基于Mixed-Fisher特征的云向量聚类模型的分类准确率明显优于传统向量空间模型,从而验证了核Fisher判别技术的有效性。

关键词: 文本情感分类, Fisher判别比, 词性特征, 云向量模型, 核Fisher判别