Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (10): 1930-1937.DOI: 10.3778/j.issn.1673-9418.2007004

• Artificial Intelligence • Previous Articles     Next Articles

Fast Graph Clustering Algorithm Based on Selection of Key Nodes

YOU Fangzhou, BAI Liang   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2. Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Online:2021-10-01 Published:2021-09-30

关键节点选择的快速图聚类算法

尤坊州白亮   

  1. 1. 山西大学 计算机与信息技术学院,太原 030006
    2. 山西大学 计算机智能与中文信息处理教育部重点实验室,太原 030006

Abstract:

Spectral clustering has attracted extensive attention as a typical graph clustering algorithm among clustering algorithms since it has really strong adaptability to complex data distribution and great clustering effect. However, it is difficult to apply spectral clustering algorithm to large scale data due to the high time complexity. To address this issue, a fast graph clustering algorithm based on the selection of key nodes is proposed. This algorithm consists of three steps. Firstly, a fast node weight evaluation method is established based on thorough consideration of the clustering and separateness. Secondly, the key nodes are selected to replace the original data set to construct a bipartite graph, and the approximated eigenvectors of the data are obtained by singular value decomposition. Thirdly, multiple approximated eigenvectors are integrated to improve the robustness of the approximated spectral clustering results. The time complexity has been reduced from [O(n3)] to [O(t(n+2n2))], facilitating the application of spectral clustering algorithm to large scale data. Through experimental analysis of this algorithm against other 7 representative spectral clustering algorithms on 5 Benchmark data sets, the comparative results demonstrate that this algorithm can identify complex class structures in data more efficiently than other clustering algorithms.

Key words: cluster analysis, graph clustering, spectral clustering, cluster ensemble, selection of key nodes

摘要:

在众多聚类算法中,谱聚类作为一种代表性的图聚类算法,由于其对复杂数据分布的适应性强、聚类效果好等优点而受到人们的广泛关注。然而,由于其高计算时间复杂度难以应用于处理大规模数据。为提高谱聚类算法在大规模数据集上的可用性,提出关键节点选择的快速图聚类算法。该算法包含三个重要步骤:第一,提出一种充分考虑抱团性和分离性的快速节点重要性评价方法;第二,选择关键节点代替原数据集构建二分图,通过奇异值分解获得数据的近似特征向量;第三,集成多次的近似特征向量,提高近似谱聚类结果的鲁棒性。该算法将时间复杂度由谱聚类原有的[O(n3)]降低到[O(t(n+2n2))],增强了其在大规模数据集上的可用性。通过该算法与其他七个具有代表性的谱聚类算法在五个Benchmark数据集上进行的实验分析,比较结果展示了该算法相比其他算法能够更加高效地识别数据中的复杂类结构。

关键词: 聚类分析, 图聚类, 谱聚类, 聚类集成, 关键节点选择