深度学习算法在藏文情感分析中的应用研究

doi:10.3778/j.issn.1673-9418.1611062

计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (7): 1122-1130.DOI: 10.3778/j.issn.1673-9418.1611062

深度学习算法在藏文情感分析中的应用研究

普次仁1+，侯佳林2，刘月2，翟东海1,2

1. 西藏大学藏文信息技术研究中心，拉萨 850000
2. 西南交通大学信息科学与技术学院，成都 610031

出版日期:2017-07-01 发布日期:2017-07-07

Deep Learning Algorithm Applied in Tibetan Sentiment Analysis

PU Ciren1+, HOU Jialin2, LIU Yue2, ZHAI Donghai1,2

1. Tibetan Information Technology Research Center, Tibet University, Lhasa 850000, China
2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China

Online:2017-07-01 Published:2017-07-07

摘要/Abstract

摘要： 针对以往进行藏文情感分析时算法忽略藏文语句结构、词序等重要信息而导致结果准确率较低的问题，将深度学习领域内的递归自编码算法引入藏文情感分析中，以更深层次提取语义情感信息。将藏文分词后，用词向量表示词语，则藏文语句变为由词向量组成的矩阵；利用无监督递归自编码算法对该矩阵向量化，此时获得的最佳藏文语句向量编码融合了语义、语序等重要信息；利用藏文语句向量和其对应的情感标签，有监督地训练输出层分类器以预测藏文语句的情感倾向。在实例验证部分，探讨了不同向量维度、重构误差系数及语料库大小对算法准确度的影响，并分析了语料库大小和模型训练时间之间的关系，指出若要快速完成模型的训练，可适当减小数据集语句条数。实例验证表明，在最佳参数组合下，所提算法准确度比传统机器学习算法中性能较好的语义空间模型高约8.6%。

关键词: 深度学习, 情感分析, 递归自编码, 递归神经网络

Abstract: During Tibetan sentiment analysis in past, the algorithm always ignores some important information like sentences structure and words order etc, which lead low accuracy of sentiment analysis. To deeply get more sentiment details, this paper proposes a novel approach of Tibetan sentiment analysis based on deep learning. Firstly, one word in Tibetan is represented by a word vector while one sentence is represented by a matrix which is composed by its word vectors; Secondly, the matrix is turned into a vector which contains most important details such as sentence meaning and words order etc, through an unsupervised recursive auto encoder algorithm; Finally, the classifier in output layer is trained by supervised method which uses the word vectors and its sentiment tags. In the experiment part, this paper discusses the selection of word vector dimensions and reconstruction error weights, studies corpus amount how to affect algorithm accuracy, and analyzes the relation between corpus amount and training time. The experimental results demonstrate that the proposed method can improve accuracy up 8.6% compared with semantic space model which is almost the best in traditional machine learning algorithm.

Key words: deep learning, sentiment analysis, recursive auto encoder, recursive neural networks

普次仁，侯佳林，刘月，翟东海. 深度学习算法在藏文情感分析中的应用研究[J]. 计算机科学与探索, 2017, 11(7): 1122-1130.

PU Ciren, HOU Jialin, LIU Yue, ZHAI Donghai. Deep Learning Algorithm Applied in Tibetan Sentiment Analysis[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7): 1122-1130.

[1]	王迪聪，白晨帅，邬开俊. 基于深度学习的视频目标检测综述[J]. 计算机科学与探索, 2021, 15(9): 1563-1577.
[2]	张晓旭，马志强，刘志强，朱方圆，王春喻. Transformer在语音识别任务中的研究现状与展望[J]. 计算机科学与探索, 2021, 15(9): 1578-1594.
[3]	陈璠，彭力. 多层级重叠条纹特征融合的行人重识别[J]. 计算机科学与探索, 2021, 15(9): 1753-1761.
[4]	武家伟，孙艳春. 融合知识图谱和深度学习方法的问诊推荐系统[J]. 计算机科学与探索, 2021, 15(8): 1432-1440.
[5]	马煜，杜慧敏，毛智礼，张霞. 深度语义分割人群密度检测技术[J]. 计算机科学与探索, 2021, 15(8): 1469-1475.
[6]	荣欢，马廷淮. 利用收益预测与策略梯度两阶段众包评论集成[J]. 计算机科学与探索, 2021, 15(8): 1476-1489.
[7]	刘继明，张培翔，刘颖，张伟东，房杰. 多模态的情感分析技术综述[J]. 计算机科学与探索, 2021, 15(7): 1165-1182.
[8]	马玉琨，徐姚文，赵欣，徐涛，王泽瑞. 人脸识别系统的活体检测综述[J]. 计算机科学与探索, 2021, 15(7): 1195-1206.
[9]	葛轶洲，许翔，杨锁荣，周青，申富饶. 序列数据的数据增强方法综述[J]. 计算机科学与探索, 2021, 15(7): 1207-1219.
[10]	方钧婷，谭晓阳. 注意力级联网络的金属表面缺陷检测算法[J]. 计算机科学与探索, 2021, 15(7): 1245-1254.
[11]	田萱，丁琪，廖子慧，孙国栋. 基于深度学习的新闻推荐算法研究综述[J]. 计算机科学与探索, 2021, 15(6): 971-998.
[12]	能文鹏，陆军，赵彩虹. 基于关系归纳偏置的睡眠分期综述[J]. 计算机科学与探索, 2021, 15(6): 1026-1037.
[13]	吕昊远，俞璐，周星宇，邓祥. 半监督深度学习图像分类方法研究综述[J]. 计算机科学与探索, 2021, 15(6): 1038-1048.
[14]	马宇，张丽果，杜慧敏，毛智礼. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121.
[15]	汤凌燕，熊聪聪，王嫄，周宇博，赵子健. 基于深度学习的短文本情感倾向分析综述[J]. 计算机科学与探索, 2021, 15(5): 794-811.

深度学习算法在藏文情感分析中的应用研究

Deep Learning Algorithm Applied in Tibetan Sentiment Analysis

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics