Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (8): 1490-1500.DOI: 10.3778/j.issn.1673-9418.2102053

• Artificial Intelligence • Previous Articles     Next Articles

Research of Density Peaks Clustering Algorithm Based on Second-Order k Neighbors

WANG Dagang, DING Shifei, ZHONG Jin   

  1. 1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
    2. School of Computer Science and Technology, Hefei Normal University, Hefei 230601, China
  • Online:2021-08-01 Published:2021-08-02

基于二阶[k]近邻的密度峰值聚类算法研究

王大刚丁世飞钟锦   

  1. 1. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
    2. 合肥师范学院 计算机学院,合肥 230601

Abstract:

Clustering by fast search and find of density peaks (DPC) is a new density clustering algorithm proposed in recent years. The core of the algorithm is based on local density and relative distance. By drawing a decision diagram, the cluster center is selected manually, and the clustering is completed. The DPC algorithm uses the cutoff distance to calculate the local density, and essentially only considers the number of neighboring nodes around it, and the algorithm uses a single-step allocation strategy, which limits the accuracy and effectiveness of the algorithm for any data set to a certain extent. To solve the above problems, this paper proposes an optimized density peaks clus-tering algorithm based on second-order k neighbors (SODPC). The algorithm calculates direct density and indirect density by introducing second-order k neighbors of nodes, redefines the calculation method of local density, and on this basis, it defines the multi-step  allocation strategy of non-central nodes  to complete the clustering. Through manual and real data tests, it is proven that the algorithm in this paper has a good clustering effect on irregular and uneven density data sets.

Key words: density peaks clustering, decision graph, second-order k neighbor, local density

摘要:

密度峰值聚类(DPC)是近年来提出的一种新的密度聚类算法,算法的核心是基于局部密度和相对距离,通过画出决策图,人为选定聚类中心,进而完成聚类。DPC算法利用截断距离计算局部密度,本质上只考虑了周围近邻节点的数量,且算法采用单步分配策略,一定程度上限制了算法对任意数据集的计算精度和有效性。针对上述问题,提出基于二阶[k]近邻的密度峰值聚类算法(SODPC)。算法通过引入节点的二阶[k]近邻,计算直接密度和间接密度,重新定义局部密度的计算方式。在此基础上,定义非中心节点的多步骤分配策略完成聚类。通过人工和真实数据的测试,证明了该算法对不规则、密度不均匀的数据集具有较好的聚类效果。

关键词: 密度峰值聚类, 决策图, 二阶[k]近邻, 局部密度