计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (6): 712-718.DOI: 10.3778/j.issn.1673-9418.1401005

• 人工智能与模式识别 • 上一篇    下一篇

异构信息网络数据上的融合概率图模型

吴  蕾+,张文生,王  珏   

  1. 中国科学院 自动化研究所,北京 100190
  • 出版日期:2014-06-01 发布日期:2014-05-30

Fusion Probabilistic Graphical Model on Heterogeneous Information Network Data

WU Lei+, ZHANG Wensheng, WANG Jue   

  1. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2014-06-01 Published:2014-05-30

摘要: 针对异构信息网络中存在多种数据目标类型,以及多种数据连接关系的问题,将多个主题模型和马尔科夫逻辑网相结合,提出了一种融合概率图模型。多个主题模型可以分别描述不同数据目标子空间的主题分布,实现对多种数据目标的预处理。用一阶逻辑子句描述的连接规则构建马尔科夫逻辑网,将每个主题模型中的不同数据目标连接起来。通过使用Gibbs采样,可以对异构网络进行参数学习和推理。在国际通用的异构信息网络DBLP数据集上的实验结果表明,使用融合概率图模型能够更好地表示不同的数据目标和连接关系。实验对比了4种典型的分类方法,多次采样得到的分类结果稳定,对作者、文章和会议取得了较好的分类结果。

关键词: 概率图模型, 主题模型, 马尔科夫逻辑网, 异构信息网络

Abstract: To solve the diversity of the data object type and the variety of the linkage relationship in the heterogeneous information network, this paper combines multiple topic models and a Markov logic network, and proposes a fusion probabilistic graphical model. The multiple topic models describe the distributions of the topic in the multiple data object subspaces, and preprocess the multiple data objects. Markov logic network connects the different data objects in each topic model using concatenation rules described by the first order logic clause. The algorithm learns the parameters and does the inference on the heterogeneous information network by using Gibbs sampling. The experimental results on the international public heterogeneous information network dataset DBLP indicate that employing the fusion probabilistic graphical model can represent the diversity of the data objects and the linkages. By compared with four typical classification methods, the experiment acquires the stable results and obtains the improved performance on the classification of the author, the paper and the conference.

Key words: probabilistic graphical model, topic model, Markov logic network, heterogeneous information network