Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (4): 929-944.DOI: 10.3778/j.issn.1673-9418.2405064

• Theory·Algorithm • Previous Articles     Next Articles

Density Peak Clustering Algorithm Optimized by Weighted Shared Neighbors

ZHANG Wenjie, XIE Juanying   

  1. School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
  • Online:2025-04-01 Published:2025-03-28

加权共享近邻优化的密度峰值聚类算法

张文杰,谢娟英   

  1. 陕西师范大学 计算机科学学院,西安 710119

Abstract: DPC (clustering by fast search and find of density peaks) algorithm??s local density definition varies with the size of a dataset, the local density of a point is sensitive to the cutoff distance dc, and its single-step assignment strategy for the remaining points can cause the “domino effect”, resulting in its incapability in finding the genuine clustering in a dataset. To address the limitations, this paper proposes a density peak clustering algorithm based on weighted shared neighbors (WSN-DPC). This algorithm utilizes standard deviation weighted distance to enhance the Euclidean distance, thereby highlighting the contributions of different features to the distances between points. Additionally, shared neighbor information is used to define the similarities between points, and the local density and relative distance of a point are defined, so as to reflect the true distribution of points within a dataset as far as possible. Furthermore, distinct assignment strategies are employed in turn for outliers and non-outliers in the dataset, so as to guarantee that each point is to be assigned to its most appropriate cluster. Extensive experiments across multiple datasets and the statistically significant test demonstrate that the proposed WSN-DPC is superior to DPC and its variants, while addressing the limitations of DPC.

Key words: shared neighbor, local density, weighted distance, cluster center, clustering

摘要: 密度峰值聚类算法DPC的样本局部密度定义随数据集规模大小不同而不同,样本局部密度受到截断距离[dc]影响,且其一步分配策略会带来样本分配的“多米诺骨牌效应”,严重影响聚类结果。提出基于加权共享近邻优化的密度峰值聚类算法(WSN-DPC)。该算法利用基于标准差加权的距离代替传统欧氏距离,强化样本不同特征对距离的贡献;利用共享近邻信息定义样本相似度,进而定义样本局部密度和相对距离,以尽可能体现数据集中样本的真实分布信息。同时,采用不同分配策略对离群点和非离群点依次进行分配,使得每个样本能够尽可能地分配到正确类簇。多个数据集的实验测试和统计性检验结果表明,WSN-DPC算法优于DPC及其改进算法,有效地解决了DPC算法的缺陷。

关键词: 共享近邻, 局部密度, 加权距离, 类簇中心, 聚类