计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (4): 336-346.

• 学术研究 • 上一篇    下一篇

复杂网络社区挖掘的距离相似度算法

李兆南1,2, 杨 博1,2, 刘大有1,2   

  1. 1. 吉林大学 计算机科学与技术学院, 长春 130012
    2. 吉林大学 符号计算与知识工程教育部重点实验室, 长春 130012
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-04-01 发布日期:2011-04-01

Distance Similarity Algorithm for Mining Communities from Complex Networks

LI Zhaonan1,2, YANG Bo1,2, LIU Dayou1,2   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
    2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin Univer-sity, Changchun 130012, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-04-01 Published:2011-04-01

摘要: 有效挖掘出复杂网络中隐藏的社区结构具有重要的理论研究意义和广泛的应用前景, 目前已有多种关于社区挖掘算法和社区性质的研究, 但还未见深入讨论结点间距离与全局社区结构内在关系的工作。因此, 深入研究了它们之间的内在联系, 发现较近(远)的结点通常以较大的概率属于相同(不同)社区, 相同(不同)社区中的结点距离通常较小(较大)。基于以上启发信息, 提出了基于结点距离相似度的社区挖掘算法(distance similarity algorithm, DSA), 采用基准数据集测试和分析了DSA算法。实验结果表明:DSA算法能够准确挖掘出隐藏在实验网络中的全部社区及其所构成的层次结构。

关键词: 复杂网络, 社区结构, 数据挖掘

Abstract: The ability to efficiently and effectively mine community structures from real-world complex networks is fundamental for both theoretical research and practical applications. Although there are many works with regard to community mining, few of them study the connections between the local distance among nodes and the global community structures of networks. This paper studies the issue and finds that the nodes with nearer (or further) dis-tance tend to belong to the same (or distinct) community with bigger probabilities, meanwhile the distance among nodes within the same (or distinct) communities tend to be smaller (or larger). Based on the heuristic in terms of the link between distance and community structure, the paper proposes a distance-based similarity measure as well as a novel community mining algorithm DSA, and validates the DSA through rigorously testing it against several benchmark networks. The experimental results show that the DSA is able to accurately discover the potential com-munities with their hierarchical structures from the tested benchmark networks.

Key words: complex network, community structure, data mining