Journal of Frontiers of Computer Science and Technology ›› 2020, Vol. 14 ›› Issue (4): 712-720.DOI: 10.3778/j.issn.1673-9418.1902022

Previous Articles    

Optimized Density Peak Clustering Algorithm by Adaptive Aggregation Strategy

QIAN Xuezhong, JIN Hui   

  1. Engineering Research Center of Internet of Things Technology Applications, Ministry of Education, School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2020-04-01 Published:2020-04-10

自适应聚合策略优化的密度峰值聚类算法

钱雪忠金辉   

  1. 江南大学 物联网工程学院 物联网技术应用教育部工程研究中心,江苏 无锡 214122

Abstract:

Aiming at the problem that the density peak clustering algorithm is greatly influenced by human interven-tion and parameter is sensitive, that is the improper selection of its parameter cutoff distance dc will lead to the wrong selection of initial cluster centers. And in some cases, even the proper value of dc is set, initial cluster centers are still difficult to be selected from the decision graph artificially. To overcome these defects, a new clustering algorithm based on density peak is proposed. Firstly, the algorithm determines the local density of data points according to the idea of K-nearest neighbors, and then a new adaptive aggregation strategy is proposed, which firstly determines the initial cluster center by the threshold of the algorithm, then allocates the remaining points according to the nearest cluster center, and finally merges the similar clusters by the density reachable between the clusters. In the experiment, the algorithm performs better than the DPC, DBSCAN, [KNNDPC] and K-means algorithm in the synthetic and actual datasets, and the algorithm can effectively improve clustering accuracy and quality.

Key words: density peak, K-nearest neighbor [(KNN)], local density, merging strategy, clustering density reachable

摘要:

针对密度峰值聚类算法受人为干预影响较大和参数敏感的问题,即不正确的截断距离[dc]会导致错误的初始聚类中心,而且在某些情况下,即使设置了适当的[dc]值,仍然难以从决策图中人为选择初始聚类中心。为克服这些缺陷,提出一种新的基于密度峰值的聚类算法。该算法首先根据[K]近邻的思想来确定数据点的局部密度,然后提出一种新的自适应聚合策略,即首先通过算法给出阈值判断初始类簇中心,然后依据离初始类簇中心最近分配剩余点,最后通过类簇间密度可达来合并相似类簇。在实验中,该算法在合成和实际数据集中的表现比DPC、DBSCAN、[KNNDPC]和K-means算法要好,能有效提高聚类准确率和质量。

关键词: 密度峰, [K]近邻(KNN), 局部密度, 合并策略, 类簇间密度可达