计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (9): 1076-1084.DOI: 10.3778/j.issn.1673-9418.1403058

• 学术研究 • 上一篇    下一篇

内容网络广义社区发现有效算法

柴变芳1,2,赵晓鹏3,贾彩燕1,于  剑1+   

  1. 1. 北京交通大学 交通数据分析与挖掘北京市重点实验室,北京 100044
    2. 石家庄经济学院 信息工程系,石家庄 050031
    3. 河北省财政厅 综合治税办公室,石家庄 050051
  • 出版日期:2014-09-01 发布日期:2014-09-03

An Efficient Algorithm for General Community Detection in Content Networks

CHAI Bianfang1,2, ZHAO Xiaopeng3, JIA Caiyan1, YU Jian1+   

  1. 1. Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
    2. Department of Information Engineering, Shijiazhuang University of Economics, Shijiazhuang 050031, China
    3. Composite Tax Management Office, Hebei Financial Department, Shijiazhuang 050051, China
  • Online:2014-09-01 Published:2014-09-03

摘要: 在对网络无任何先验知识情形下,PPSB-DC模型(popularity and productivity stochastic block model and discriminative content model)利用网络的内容和链接对网络生成过程进行建模,可有效地发现广义社区及社区间的链接模式。但该概率模型的参数估计算法耗时,初始链接模式参数设置敏感,限制了该模型的应用。对参数求解算法进行了改进,设计了一个有效的内容网络广义社区发现算法EPPSBDC(efficient PPSB-DC)。该算法通过采取抽样和并行技术,提高了算法运行速度,通过引入链接概率先验,消除了算法对初始参数的敏感性。在内容网络上与同类算法进行了比较,验证了EPPSBDC算法的有效性。

关键词: 广义社区发现, 大规模内容网络, 随机块模型, 抽样

Abstract: Without any prior knowledge about networks, the PPSB-DC (popularity and productivity stochastic block model and discriminative content model) models the generative process by contents and links, which makes it be able to detect general communities and identify link patterns between any two communities. However, the algorithm for this probabilistic model costs much time and is sensible to the initial parameters of link patterns. These disadvantages limit the application of the algorithm. In order to improve the parameter estimation algorithm, this paper proposes an efficient algorithm for general community detection in content networks EPPSBDC (efficient PPSB-DC). EPPSBDC improves the speed by sampling and parallel strategies, and decreases the sensibility for the initial parameters by introducing a prior of link pattern. Comparisons of similar algorithms in content networks demonstrate the validity of EPPSBDC.

Key words: general community detection, massive content networks, stochastic block model, sampling