计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (8): 1910-1922.DOI: 10.3778/j.issn.1673-9418.2111138

• 理论与算法 • 上一篇    下一篇

去中心化加权簇归并的密度峰值聚类算法

赵力衡1,+(), 王建1,2, 陈虹君1   

  1. 1. 成都锦城学院 电子信息学院,成都 611731
    2. 四川大学 计算机学院,成都 610041
  • 收稿日期:2021-11-30 修回日期:2022-03-24 出版日期:2022-08-01 发布日期:2022-08-19
  • 通讯作者: +E-mail: 1503233800@qq.com
  • 作者简介:赵力衡(1976—),男,四川成都人,硕士,高级工程师,CCF专业会员,主要研究方向为数据挖掘、海量数据存储。
    王建(1979—),男,四川泸州人,博士,副教授,主要研究方向为人工智能、数据挖掘。
    陈虹君(1979—),女,四川广安人,硕士,教授,主要研究方向为大数据、人工智能。
  • 基金资助:
    教育部协同育人项目(201902005069);四川省科技厅重点研发项目(22ZDYF0724)

Density-Peak Clustering Algorithm on Decentralized and Weighted Clusters Merging

ZHAO Liheng1,+(), WANG Jian1,2, CHEN Hongjun1   

  1. 1. Department of Electronic Information Engineering, Chengdu Jincheng College, Chengdu 611731, China
    2. School of Computer, Sichuan University, Chengdu 610041, China
  • Received:2021-11-30 Revised:2022-03-24 Online:2022-08-01 Published:2022-08-19
  • About author:ZHAO Liheng, born in 1976, M.S., senior engineer, professional member of CCF. His research interests include data mining and massive data storage.
    WANG Jian, born in 1979, Ph.D., associate professor. His research interests include artificial intelligence and data mining.
    CHEN Hongjun, born in 1979, M.S., professor. Her research interests include big data and artificial intelligence.
  • Supported by:
    the Collaborative Education Project of Ministry of Education of China(201902005069);the Key Research and Development Project of Sichuan Provincial Science and Technology Department(22ZDYF0724)

摘要:

快速搜索和寻找密度峰值聚类算法(DPC)是近年来提出的一种基于密度的聚类算法,具有原理简单、无需迭代并能实现任意形状聚类的优点。但该算法仍存在一些缺陷:围绕聚类中心点聚类,使聚类结果受中心点影响显著,且聚类中心点数量仍需人为指定;截断距离仅考虑了数据的分布密度,忽略了数据的内部特征;聚类过程中若有样本存在分配错误,会导致其后续样本聚类出现跟随错误。针对上述问题,尝试提出一种去中心化加权簇归并的密度峰值聚类算法(DCM-DPC)。该算法引入权重系数重新定义了局部密度,并由此划分出位于不同局部高密度区域的核心样本组,用于取代聚类中心点成为聚类的依据。最后将剩余样本按其近邻样本所在类簇的众数,或分配到最高耦合的核心样本组代表的类簇中或标注为离散点以完成聚类。在人工和UCI数据集上的实验结果表明,提出算法的聚类效果优于对比算法,对相互纠缠的类簇的边界样本划分也更加精确。

关键词: 密度峰值, 聚类, 去中心点, 邻域, 簇归并

Abstract:

The clustering by fast search and find of density peaks (DPC) is a density-based clustering algorithm proposed in recent years, which has the advantages of simple principle, no iteration and clustering of arbitrary shape. However, the algorithm still has some defects: clustering around clustering centers makes the clustering results significantly affected by central points, and the number of clustering centers needs to be manually specified; the cutoff distance considers the distribution density of the data but ignores the internal features; if there is a sample allocation error in the clustering process, the subsequent sample clustering may amplify the error. To solve the above problems, this paper proposes a density-peak clustering algorithm on decentralized and weighted clusters merging (DCM-DPC). This algorithm introduces the weight to redefine the local density, dividing core sample groups located in different local high density regions to replace cluster centers as the cluster basis. Finally, the remaining samples are assigned to the highest coupled core sample groups or labeled as discrete points by their near neighbor samples. Experiments on artificial and UCI datasets show that the clustering performance of the proposed algorithm outperforms the contrast algorithms, and the boundary samples partition of the entangled clusters is more accurate.

Key words: density peaks, clustering, decentralized, neighborhood, clusters merging

中图分类号: