计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (8): 729-735.DOI: 10.3778/j.issn.1673-9418.1305005

• 学术研究 • 上一篇    下一篇

利用核心集粗化的多层聚类算法

马儒宁1,王  萍1,丁军娣2+   

  1. 1. 南京航空航天大学 理学院,南京 211100
    2. 南京理工大学 计算机科学与工程学院,南京 210094
  • 出版日期:2013-08-01 发布日期:2013-08-06

Multilevel Clustering Algorithm Using Core-Sets Coarsening

MA Runing1, WANG Ping1, DING Jundi2+   

  1. 1. College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
    2. College of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Online:2013-08-01 Published:2013-08-06

摘要: 粗化是多层聚类算法中的关键步骤。经典的多层聚类算法,如METIS(multilevel scheme for partitioning irregular graphs)、Graclus等,利用顶点和边权的若干准则合并顶点和边,实现粗化,其缺点是粗化之后的小规模数据集无法准确表述原数据集的全局信息和结构。提出了核心集粗化(core-sets coarsening)的方法,通过定义“多层核心集”,逐层保留数据集的全局信息。同时,顶层核心点的个数与聚类个数相同,其每个核心点对应一个单独的类,因此不需要一般多层聚类中的划分过程。实验结果表明了该算法的有效性。

关键词: 多层, 粗化, 核心集, 聚类

Abstract: Coarsening phase is the most critical step among procedures in multilevel clustering algorithm. Some classical multilevel clustering algorithms, such as METIS (multilevel scheme for partitioning irregular graphs) and Graclus, use some criterions of vertex and edge weights to capture the collapsing of the vertex and edges and realize coarsening procedure. But there is the disadvantage that the coarsest dataset can not formulate the global information and structure of original dataset correctly. This paper proposes a core-sets coarsening method, which defines multilevel core-sets to retain global information of layered dataset in perspective. Meanwhile, as the coarsest dataset has the same number as clustering, and each core point corresponds to a single class, the partitioning procedure need not be considered. Some numerical experiments verify the superiority and availability of the proposed algorithm.

Key words: multilevel, coarsening, core-sets, clustering