计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (4): 711-720.DOI: 10.3778/j.issn.1673-9418.1804033

• 理论与算法 • 上一篇    

自然最近邻优化的密度峰值聚类算法

金  辉+,钱雪忠   

  1. 江南大学 物联网工程学院 物联网技术应用教育部工程研究中心,江苏 无锡 214122
  • 出版日期:2019-04-01 发布日期:2019-04-10

Optimized Density Peak Clustering Algorithm by Natural Nearest Neighbor

JIN Hui+, QIAN Xuezhong   

  1. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-04-01 Published:2019-04-10

摘要: 针对现有的基于密度的聚类算法存在参数敏感,处理非球面数据和复杂流形数据聚类效果差的问题,提出一种新的基于密度峰值的聚类算法。该算法首先根据自然最近邻居的概念确定数据点的局部密度,然后根据密度峰局部密度最高并且被稀疏区域分割来确定聚类中心,最后提出一种新的类簇间相似度概念来解决复杂流形问题。在实验中,该算法在合成和实际数据集中的表现比DPC(clustering by fast search and find of density peaks)、DBSCAN(density-based spatial clustering of applications with noise)和K-means算法要好,并且在非球面数据和复杂流形数据上的优越性特别大。

关键词: 密度峰, 自然最近邻居, 局部密度, 稀疏区域, 类簇间相似度

Abstract: Aiming at the problem that the existing density-based clustering algorithm is sensitive to parameters and the clustering result of aspheric data and complex manifold data is bad, a new clustering algorithm based on density peak is proposed. The algorithm first determines the local density of data based on the natural nearest neighbor, and then determines the clustering center based on which density peaks have the highest local density and are divided by sparse regions. Finally, a new concept of similarity between clusters is proposed to solve complex manifold problems. In the experiment, the performance of this algorithm is better than that of DPC (clustering by fast search and find of density peaks), DBSCAN (density-based spatial clustering of applications with noise) and K-means in synthetic and actual data sets, and the advantages of aspheric data and complex manifold data are particularly superior.

Key words: density peak, natural nearest neighbor, local density, sparse regions, similarity between clusters