计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (4): 1010-1020.DOI: 10.3778/j.issn.1673-9418.2311023

• 人工智能·模式识别 • 上一篇    下一篇

融合BERT多层次特征的短视频网络舆情情感分析研究

韩坤,潘宏鹏,刘忠轶   

  1. 中国人民公安大学 公安管理学院,北京 100038
  • 出版日期:2024-04-01 发布日期:2024-04-01

Research on Sentiment Analysis of Short Video Network Public Opinion by Integrating BERT Multi-level Features

HAN Kun, PAN Hongpeng, LIU Zhongyi   

  1. School of Public Security Management, People’s Public Security University of China, Beijing 100038, China
  • Online:2024-04-01 Published:2024-04-01

摘要: 自媒体时代与网络社交软件的广泛普及,导致短视频平台极易成为舆情事件起源和发酵的“孵化器”。分析短视频平台中的舆情评论信息,对于舆情事件的预警、处置和引导具有重要意义。鉴于此,结合BERT与TextCNN模型,提出一种融合BERT多层次特征的文本分类模型(BERT-MLFF-TextCNN),并对抖音短视频平台中的相关评论文本数据进行情感分析。首先,利用BERT预训练模型对输入文本进行编码。其次,提取各编码层中的语义特征向量进行融合。然后,融入自注意力机制突出其关键特征,从而实现特征的有效利用。最后,将所得特征序列输入TextCNN模型中进行分类。实验结果表明,与BERT-TextCNN、GloVe-TextCNN和Word2vec-TextCNN模型相比,BERT-MLFF-TextCNN模型表现更优,[F1]值达到了0.977。通过该模型能够有效识别短视频平台舆情的情感倾向,在此基础上利用TextRank算法进行主题挖掘,实现舆情评论情感极性的主题词可视化,为相关部门的舆情管控工作提供决策参考。

关键词: 网络舆情, 情感分析, 主题可视化, BERT

Abstract: The era of self-media and the widespread popularity of online social software have led to short video platforms becoming “incubators” easily for the origin and fermentation of public opinion events. Analyzing the public opinion comments on these platforms is crucial for the early warning, handling, and guidance of such incidents. In view of this, this paper proposes a text classification model combining BERT and TextCNN, named BERT-MLFF-TextCNN, which integrates multi-level features from BERT for sentiment analysis of relevant comment data on the Douyin short video platform. Firstly, the BERT pre-trained model is used to encode the input text. Secondly, semantic feature vectors from each encoding layer are extracted and fused. Subsequently, a self-attention mechanism is integrated to highlight key features, thereby effectively utilizing them. Finally, the resulting feature sequence is input into the TextCNN model for classification. The results demonstrate that the BERT-MLFF-TextCNN model outperforms BERT-TextCNN, GloVe-TextCNN, and Word2vec-TextCNN models, achieving an [F1] score of 0.977. This model effectively identifies the emotional tendencies in public opinions on short video platforms. Based on this, using the TextRank algorithm for topic mining allows for the visualization of thematic words related to the sentiment polarity of public opinion comments, providing a decision-making reference for relevant departments in the public opinion management work.

Key words: network public opinion, sentiment analysis, theme visualization, BERT