Journal of Frontiers of Computer Science and Technology ›› 2020, Vol. 14 ›› Issue (9): 1563-1570.DOI: 10.3778/j.issn.1673-9418.1910037

Previous Articles     Next Articles

SentiBERT: Pre-training Language Model Combining Sentiment Information

YANG Chen, SONG Xiaoning, SONG Wei   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2020-09-01 Published:2020-09-07



  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122


Pre-training language models on large-scale unsupervised corpus are attracting the attention of researchers in the field of natural language processing. The existing model mainly extracts the semantic and structural features of the text in the pre-training stage. Aiming at sentiment task and complex emotional features, a pre-training method focusing on learning sentiment features is proposed on the basis of the latest pre-training language model BERT(bidirectional encoder representations from transformers). In the further pre-training stage, this paper improves pre-training task of BERT with the help of sentiment dictionary. At the same time, this paper uses context-based word sentiment prediction task to classify the sentiment of masked words to acquire the textual representation biased towards sentiment features. Finally, fine-tuning is performed on a small labeled data sets. Experimental results show that, compared with the original BERT model, the accuracy of sentiment tasks can be improved by 1 percentage point. More advanced results can be achieved at small training sets.

Key words: bidirectional encoder representations from transformers (BERT), sentiment classification, pre-training language models, multi-task learning



关键词: BERT, 情感分类, 预训练语言模型, 多任务学习