计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (11): 1614-1622.DOI: 10.3778/j.issn.1673-9418.1510049

• 人工智能与模式识别 • 上一篇    下一篇

自动确定聚类中心的密度峰聚类

李  涛1,葛洪伟1,2+,苏树智1   

  1. 1. 江南大学 物联网工程学院,江苏 无锡 214122
    2. 轻工过程先进控制教育部重点实验室(江南大学),江苏 无锡 214122
  • 出版日期:2016-11-01 发布日期:2016-11-04

Density Peaks Clustering by Automatic Determination of Cluster Centers

LI Tao1, GE Hongwei1,2+, SU Shuzhi1   

  1. 1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. Ministry of Education Key Laboratory of Advanced Process Control for Light Industry (Jiangnan University), Wuxi, Jiangsu 214122, China
  • Online:2016-11-01 Published:2016-11-04

摘要: 密度峰聚类是一种新的基于密度的聚类算法,该算法不需要预先指定聚类数目,能够发现非球形簇。针对密度峰聚类算法需要人工确定聚类中心的缺陷,提出了一种自动确定聚类中心的密度峰聚类算法。首先,计算每个数据点的局部密度和该点到具有更高密度数据点的最短距离;其次,根据排序图自动确定聚类中心;最后,将剩下的每个数据点分配到比其密度更高且距其最近的数据点所属的类别,并根据边界密度识别噪声点,得到聚类结果。将新算法与原密度峰算法进行对比,在人工数据集和UCI数据集上的实验表明,新算法不仅能够自动确定聚类中心,而且具有更高的准确率。

关键词: 聚类, 密度峰, 自动聚类, 密度聚类

Abstract: Density peaks clustering is a new density-based clustering algorithm. It can find nonspherical clusters without specifying the cluster number. Aiming at the defect that the density peaks clustering algorithm can only manually determine cluster centers, this paper proposes a density peaks clustering by automatic determination of cluster centers. Firstly, for each data point, two quantities are calculated: the local density and the distance from points of higher density. Then the algorithm automatically searches the clustering centers according to the sorting graph. Finally, each remaining data point is assigned to the same cluster as its nearest neighbor of higher density, and then the noises are recognized according to the border density. Comparing the new algorithm with the density peaks clustering algorithm, the experimental results on synthetic data sets and UCI data sets show that the new algorithm can not only automatically determine cluster centers, but also get better results with higher accuracy.

Key words: clustering, density peaks, automatically clustering, density clustering