Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (5): 931-940.DOI: 10.3778/j.issn.1673-9418.2006060

• Artificial Intelligence • Previous Articles     Next Articles

Peak Density Clustering Algorithm Combining Natural and Shared Nearest Neighbor

BAI Exiang, LUO Ke, LUO Xiao   

  1. 1. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
    2. State Grid Shanghai Municipal Electric Power Company, Shanghai 200000, China
  • Online:2021-05-01 Published:2021-04-30

结合自然和共享最近邻的密度峰值聚类算法

柏锷湘罗可罗潇   

  1. 1. 长沙理工大学 计算机与通信工程学院,长沙 410114
    2. 国网上海电力公司,上海 200000

Abstract:

The clustering by fast search and find of density peaks (DPC) has the advantages of no iteration and fewer parameters, but it still has some shortcomings: the need to manually select the cutoff distance parameter and the processing effect is not good on the manifold data set. In response to these problems, an improved density peak clustering algorithm is proposed. The algorithm combines the natural and shared nearest neighbor algorithm, redefines the calculation method of cut-off distance and local density. It integrates the concept of candidate cluster center calculation, selects different candidate cluster centers through the algorithm, uses these candidate centers as a new data set, and starts density peak clustering again. Finally, the remaining points are assigned to the clusters where the corresponding candidate center points are located. The improved algorithm is verified on the synthetic data set and UCI data set rows, and compared with the K-means, DBSCAN (density-based algorithm for discovering clusters in large spatial databases with noise) and DPC algorithm. Experimental results show that the algorithm proposed in this paper has significant improvement in performance.

Key words: density peak clustering algorithm, natural nearest neighbor, shared nearest neighbor

摘要:

基于快速搜索和寻找密度峰值聚类算法(DPC)具有无需迭代且需要较少参数的优点,但其仍然存在一些缺点:需要人为选取截断距离参数;在流形数据集上的处理效果不佳。针对这些问题,提出一种密度峰值聚类改进算法。该算法结合了自然和共享最近邻算法,重新定义了截断距离和局部密度的计算方法,并且算法融合了候选聚类中心计算概念,通过算法选出不同的候选聚类中心,然后以这些候选中心为新的数据集,再次开始密度峰值聚类,最后将剩余的点分配到所对应的候选中心点所在类簇中。改进的算法在合成数据集和UCI数据集上进行验证,并与K-means、DBSCAN和DPC算法进行比较。实验结果表明,提出的算法在性能方面有明显提升。

关键词: 密度峰值聚类算法, 自然最近邻, 共享最近邻