计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (9): 2163-2176.DOI: 10.3778/j.issn.1673-9418.2102021

• 理论与算法 • 上一篇    

加权K近邻和多簇合并的密度峰值聚类算法

陈磊, 吴润秀(), 李沛武, 赵嘉   

  1. 南昌工程学院 信息工程学院,南昌 330099
  • 收稿日期:2021-02-05 修回日期:2021-04-02 出版日期:2022-09-01 发布日期:2021-04-19
  • 通讯作者: + E-mail: wurunxiu@tom.com
  • 作者简介:陈磊(1997—),男,硕士研究生,主要研究方向为数据挖掘。
    吴润秀(1971—),女,教授,硕士生导师,主要研究方向为群智能算法及应用。
    李沛武(1963—),男,博士,教授,硕士生导师,主要研究方向为计算机安全、大数据分析。
    赵嘉(1981—),男,博士,教授,硕士生导师,主要研究方向为数据挖掘、群智能算法。
  • 基金资助:
    江西省教育厅科技项目(GJJ180940);国家自然科学基金(52069014);国家自然科学基金(51669014);江西省杰出青年基金(2018ACB21029)

Weighted K-nearest Neighbors and Multi-cluster Merge Density Peaks Clustering Algorithm

CHEN Lei, WU Runxiu(), LI Peiwu, ZHAO Jia   

  1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
  • Received:2021-02-05 Revised:2021-04-02 Online:2022-09-01 Published:2021-04-19
  • About author:CHEN Lei, born in 1997, M.S. candidate. His research interest is data mining.
    WU Runxiu, born in 1971, professor, M.S. supervisor. Her research interests include swarm intelligence algorithm and application.
    LI Peiwu, born in 1963, Ph.D., professor, M.S. supervisor. His research interests include computer security and big data analysis.
    ZHAO Jia, born in 1981, Ph.D., professor, M.S. supervisor. His research interests include data mining and swarm intelligence algorithm.
  • Supported by:
    Science and Technology Project of Jiangxi Province Department of Education(GJJ180940);National Natural Science Foundation of China(52069014);National Natural Science Foundation of China(51669014);Science Fund for Distinguished Young Scholars of Jiangxi Province(2018ACB21029)

摘要:

密度峰值聚类(DPC)算法是一种基于密度的聚类算法。该算法原理简单、运行高效,可以找到任意非球形类簇。但是该算法存在一些缺陷:首先,该算法局部密度定义的度量准则不统一且两者的聚类结果存在较大差异;其次,该算法的分配策略易产生分配连带错误,即一旦某一个样本分配错误,会导致后续一连串的样本分配错误。为解决这些问题,提出了一种加权$K$近邻和多簇合并的密度峰值聚类算法(WKMM-DPC)。该算法结合加权$K$近邻的思想,引入样本的权重系数,重新定义样本的局部密度,使局部密度更加依赖于K近邻内样本的位置,且统一了密度定义的度量准则;定义了类簇间的相似度,并据此度量准则进行多簇合并,以避免分配剩余样本时的分配连带错误。在人工和UCI数据集上的实验表明,该算法的聚类效果优于FKNN-DPC、DPCSA、FNDPC、DPC和DBSCAN算法。

关键词: 聚类, 局部密度, 密度峰值, K近邻(KNN), 多簇合并

Abstract:

Density peaks clustering (DPC) algorithm is a clustering algorithm based on density. The algorithm is simple in principle and efficient in operation, and can find any non-spherical class clusters. However, there are some defects in the algorithm. Firstly, the measurement criteria defined by the local density are not uniform and there are great differences in the clustering results. Secondly, the allocation strategy is prone to allocation errors, that is once a sample is incorrectly allocated, a series of subsequent samples will be incorrectly allocated too. In order to solve these problems, this paper proposes a weighted K-nearest neighbors and multi-cluster merge density peaks clustering (WKMM-DPC) algorithm. Combined with the idea of weighted K-nearest neighbors, the local density of the sample is redefined by introducing the weight coefficient of the sample, which makes the local density more dependent on the position of the sample in the K-nearest neighbors, and unifies the measurement criteria of density definition. The similarity between clusters is defined, and the clusters are merged according to the metric to avoid the joint error in the allocation of remaining samples. Experiments on artificial and UCI datasets show that the clustering performance of the proposed algorithm is better than that of FKNN-DPC, DPCSA, FNDPC, DPC and DBSCAN algorithms.

Key words: clustering, local density, density peaks, K-nearest neighbors (KNN), multi-cluster merge

中图分类号: