计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (3): 410-425.DOI: 10.3778/j.issn.1673-9418.1903065

• 学术研究 • 上一篇    下一篇

基于LDA-DeepHawkes模型的信息级联预测

王世杰,周丽华,孔兵,周俊华   

  1. 1.云南大学 信息学院,昆明 650504
    2.云南大学 公共管理学院,昆明 650504
  • 出版日期:2020-03-01 发布日期:2020-03-13

LDA-DeepHawkes Model for Predicting Information Cascade

WANG Shijie, ZHOU Lihua, KONG Bing, ZHOU Junhua   

  1. 1.School of Information Science & Engineering, Yunnan University, Kunming 650504, China
    2.School of Public Administration, Yunnan University, Kunming 650504, China
  • Online:2020-03-01 Published:2020-03-13

摘要:

基于信息早期的传播特征来预测其未来的传播范围具有广泛的应用价值。DeepHawkes模型将Hawkes模型与深度学习相结合,不仅继承了Hawkes模型能够表征和建模信息扩散过程的高度可解释性,又具备深度学习自主学习流行度预测隐含特征的高准确预测能力,弥合了传统方法中信息级联的预测与理解之间的间隙。然而,DeepHawkes模型忽略了信息本身的文本内容对于传播的影响。在DeepHawkes模型的基础上提出了既考虑级联的因素又考虑文本内容的LDA-DeepHawkes模型,更加全面地建模信息扩散过程,在继承DeepHawkes高解释性的同时,进一步提高预测准确度。在两个新浪微博数据集上对比了LDA-DeepHawkes模型与其他模型的预测准确度,分析了模型中参数对预测效果的影响。实验结果表明:LDA-DeepHawkes模型有较好的预测精度,说明信息的文本内容也是影响信息扩散的重要因素。

关键词: 流行度预测, 信息级联, Hawkes过程, 深度学习, 隐含狄利克雷分布(LDA)主题模型

Abstract:

It is an important research point of social network analysis to predict future propagation range of infor-mation based on its early propagation characteristics. DeepHawkes model combines Hawkes model with deep learning, which not only inherits clear interpretability of Hawkes model to characterize and model the information diffusion process, but also carries on the high prediction power of end-to-end deep learning by automatically learning the latent representations of the input data, bridging the gap between prediction and understanding of information cascades. However, DeepHawkes model ignores the effect of the text content on the propagation. The LDA-Deep-Hawkes model takes cascade factors as well as text content into account, and models the process of information diffusion in a more comprehensive way, so as to further improve the prediction accuracy while inheriting the high interpretability of DeepHawkes model. The prediction accuracy of LDA-DeepHawkes model is compared with other models on two real data sets from Sina Weibo, and the influence of parameters of the model on the prediction accuracy is analyzed. The experimental results show that the LDA-DeepHawkes model has better prediction accuracy, indicating that the text content of information is also an important factor affecting the information diffusion.

Key words: prevalence prediction, information cascade, Hawkes process, deep learning, latent Drichlet allocation (LDA) topic model