计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (5): 792-802.DOI: 10.3778/j.issn.1673-9418.1904001

• 人工智能 • 上一篇    下一篇

优化分配策略的密度峰值聚类算法

丁志成,葛洪伟   

  1. 1. 江南大学 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
    2. 江南大学 物联网工程学院,江苏 无锡 214122
  • 出版日期:2020-05-01 发布日期:2020-05-08

Density Peaks Clustering with Optimized Allocation Strategy

DING Zhicheng, GE Hongwei   

  1. 1. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2020-05-01 Published:2020-05-08

摘要:

针对密度峰值聚类算法在面对复杂结构数据集时容易出现分配错误的问题,提出一种优化分配策略的密度峰值聚类算法(ODPC)。新算法首先引入参数积[γ],扩大了聚类中心的选取范围;然后使用改进的数据点分配策略,对数据集的数据点进行基于相似度指标[MS]的重新分配,进一步优化了簇类中点集的分配;最后使用[dc]近邻法优化识别数据集的噪声点。在人工数据集及UCI真实数据集上的实验均可证明,新算法能够在优化噪声识别的同时,提高复杂流形数据集中数据点分配的正确率,并取得比DPC算法、DenPEHC算法、GDPC算法更好的聚类效果。

关键词: 密度聚类, 快速搜索与发现密度峰值聚类(DPC), 分配策略

Abstract:

Focused on the issue that density peaks clustering algorithm will make mistakes when facing data sets allocation with complex structures, a kind of density peaks clustering with optimized allocation strategy (ODPC) is proposed in this paper. Firstly, the parameter product [γ] is introduced into the new algorithm to expand the selection of cluster centers. Then, it proposes an improved allocation strategy for data points, which redistributes points of data sets with similarity index [MS,] and further optimizes the allocation of points. Finally, [dc] nearest neighbor method is used to optimally identify the noise points of data sets. The experiments on artificial and UCI real data sets show that the new algorithm can improve the accuracy of complex manifold data sets allocation while opti-mizing noise recognition, and achieves better clustering results than DPC (clustering by fast search and find of den-sity peaks), DenPEHC (density peak based efficient hierarchical clustering) and GDPC (density peaks clustering algorithm with gird-division strategy) algorithms.

Key words: density clustering, clustering by fast search and find of density peaks (DPC), allocation strategy