Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (7): 1122-1130.DOI: 10.3778/j.issn.1673-9418.1611062

Previous Articles     Next Articles

Deep Learning Algorithm Applied in Tibetan Sentiment Analysis

PU Ciren1+, HOU Jialin2, LIU Yue2, ZHAI Donghai1,2   

  1. 1. Tibetan Information Technology Research Center, Tibet University, Lhasa 850000, China
    2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
  • Online:2017-07-01 Published:2017-07-07

深度学习算法在藏文情感分析中的应用研究

普次仁1+,侯佳林2,刘  月2,翟东海1,2   

  1. 1. 西藏大学 藏文信息技术研究中心,拉萨 850000
    2. 西南交通大学 信息科学与技术学院,成都 610031

Abstract: During Tibetan sentiment analysis in past, the algorithm always ignores some important information like sentences structure and words order etc, which lead low accuracy of sentiment analysis. To deeply get more sentiment details, this paper proposes a novel approach of Tibetan sentiment analysis based on deep learning. Firstly, one word in Tibetan is represented by a word vector while one sentence is represented by a matrix which is composed by its word vectors; Secondly, the matrix is turned into a vector which contains most important details such as sentence meaning and words order etc, through an unsupervised recursive auto encoder algorithm; Finally, the classifier in output layer is trained by supervised method which uses the word vectors and its sentiment tags. In the experiment part, this paper discusses the selection of word vector dimensions and reconstruction error weights, studies corpus amount how to affect algorithm accuracy, and analyzes the relation between corpus amount and training time. The experimental results demonstrate that the proposed method can improve accuracy up 8.6% compared with semantic space model which is almost the best in traditional machine learning algorithm.

Key words: deep learning, sentiment analysis, recursive auto encoder, recursive neural networks

摘要: 针对以往进行藏文情感分析时算法忽略藏文语句结构、词序等重要信息而导致结果准确率较低的问题,将深度学习领域内的递归自编码算法引入藏文情感分析中,以更深层次提取语义情感信息。将藏文分词后,用词向量表示词语,则藏文语句变为由词向量组成的矩阵;利用无监督递归自编码算法对该矩阵向量化,此时获得的最佳藏文语句向量编码融合了语义、语序等重要信息;利用藏文语句向量和其对应的情感标签,有监督地训练输出层分类器以预测藏文语句的情感倾向。在实例验证部分,探讨了不同向量维度、重构误差系数及语料库大小对算法准确度的影响,并分析了语料库大小和模型训练时间之间的关系,指出若要快速完成模型的训练,可适当减小数据集语句条数。实例验证表明,在最佳参数组合下,所提算法准确度比传统机器学习算法中性能较好的语义空间模型高约8.6%。

关键词: 深度学习, 情感分析, 递归自编码, 递归神经网络