计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (4): 554-565.DOI: 10.3778/j.issn.1673-9418.1906001

• 学术研究 • 上一篇    下一篇

基于密度峰值和近邻优化的聚类算法

何云斌,董恒,万静,李松   

  1. 哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080
  • 出版日期:2020-04-01 发布日期:2020-04-10

Clustering Algorithm Based on Density Peak and Neighbor Optimization

HE Yunbin, DONG Heng, WAN Jing, LI Song   

  1. College of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Online:2020-04-01 Published:2020-04-10

摘要:

针对密度峰值算法在选取聚类中心时的时间复杂度过高,需要人工选择截断距离并且处理流形数据时有可能出现多个密度峰值,导致聚类准确率下降等问题,提出一种新的密度峰值聚类算法,从聚类中心选择、离群点筛选、数据点分配三方面进行讨论和分析,并给出相应的聚类算法。在聚类中心的选择上采取KNN的思想计算数据点的密度,离群点的筛选和剪枝以及数据点分配则利用Voronoi图的性质,结合数据点的分布特征进行处理,并在最后应用层次聚类的思想以合并相似类簇,提高聚类准确率。实验结果表明:所提算法与实验对比算法相比较,具有较好的聚类效果和准确性。

关键词: 密度聚类, Voronoi图, 离群点, 最近邻

Abstract:

The time complexity of density peak algorithm in selecting the cluster center is very high. It needs to manually select the cutoff distance. When processing the manifold data, there may be multiple density peaks, which leads to the decrease of clustering accuracy. In this paper, a new density peak clustering algorithm is proposed. This paper discusses and analyzes the clustering algorithm from three aspects of clustering center selection, outlier filtering and data point allocation. The clustering algorithm uses the KNN idea to calculate the density of data points in the selection of the cluster center. The screening and pruning of the outliers and the data point allocation are processed by the properties of the Voronoi diagram combined with the distribution characteristics of the data points. Finally, the hierarchical clustering idea is applied to merge similar clusters to improve clustering accuracy. The experimental results show that compared with the experimental comparison algorithms, the proposed algorithm has better clustering effect and accuracy.

Key words: density clustering, Voronoi diagram, outliers, nearest neighbors