计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (12): 2809-2819.DOI: 10.3778/j.issn.1673-9418.2104019

• 人工智能 • 上一篇    下一篇

最近邻的密度峰值聚类标签传播算法

宋鹏1,2, 葛洪伟1,2,+()   

  1. 1.江南大学 人工智能与计算机学院,江苏 无锡 214122
    2.江苏省模式识别与计算智能工程实验室(江南大学),江苏 无锡 214122
  • 收稿日期:2021-04-08 修回日期:2021-05-26 出版日期:2022-12-01 发布日期:2021-05-18
  • 通讯作者: +E-mail: ghw8601@163.com
  • 作者简介:宋鹏(1996—),男,辽宁丹东人,硕士研究生,CCF学生会员,主要研究方向为模式识别、机器学习。
    葛洪伟(1967—),男,江苏无锡人,博士,教授,博士生导师,主要研究方向为人工智能、模式识别、机器学习、图像处理与分析等。
  • 基金资助:
    国家自然科学基金(61806006);江苏高校优势学科建设工程资助项目。

Nearest Neighbor Label Propagation for Density Peak Clustering

SONG Peng1,2, GE Hongwei1,2,+()   

  1. 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan Uni-versity, Wuxi, Jiangsu 214122, China
  • Received:2021-04-08 Revised:2021-05-26 Online:2022-12-01 Published:2021-05-18
  • About author:SONG Peng, born in 1996, M.S. candidate, stu-dent member of CCF. His research interests include pattern recognition and machine learning.
    GE Hongwei, born in 1967, Ph.D., professor, Ph.D. supervisor. His research interests include artificial intelligence, pattern recognition, machine learning, image processing and analysis, etc.
  • Supported by:
    National Natural Science Foundation of China(61806006);Priority Academic Development Program of Jiangsu Higher Education Institutions.

摘要:

基于动态图的密度峰值聚类标签传播算法(DPC-DLP)是密度峰值聚类算法(DPC)的一种改进算法,该算法涉及的相关参数过于复杂,并且算法在每次迭代时都会使用标签数据,会出现标签错误扩大化现象,存在迭代次数过多导致聚类效果恶化等问题。针对上述问题,提出了一种最近邻的密度峰值聚类标签传播算法(DPC-NLP)。该算法主要有三个步骤:首先利用局部密度和最小距离对样本点进行打分,根据分数确定聚类中心,然后使用聚类中心的标签在其最近邻内形成簇骨干,最后使用最近邻的标签传播方法将簇骨干的标签传播到剩余样本上,并形成最终的聚类结果。最近邻标签传播算法充分考虑数据间的结构关联性情况,并在传播的过程中不断更新数据的状态,利用更充分的信息提高分配正确率。在人工和真实数据集上对算法进行验证,并与目前主流的聚类算法进行比较,实验结果表明,DPC-NLP在性能和鲁棒性方面表现优越,并可以处理流形和非线性等复杂数据。

关键词: 聚类, 密度峰值聚类, 标签传播, 最近邻

Abstract:

Dynamic graph-based label propagation for density peaks clustering (DPC-DLP) is an improved algo-rithm of density peaks clustering (DPC). The related parameters involved in the algorithm are too complex and the algorithm uses labeled data in each iteration, which will lead to the expansion of label errors and the deterioration of clustering effect due to too many iterations. To solve the above problems, this paper proposes a nearest neighbor label propagation for density peak clustering (DPC-NLP), which mainly has three steps. Firstly, the local density and the minimum distance are used to score the sample points and the clustering center is determined according to the score. Then, the clustering backbone is formed by the labels of the clustering center in its nearest neighbor. Finally, the label propagation method based on the nearest neighbor is used to propagate the labels of the clustering back-bone to the remaining samples and the final clustering results are formed. The nearest neighbor label propagation algorithm takes full account of the structural association between samples, constantly updates the state of data in the process of propagation, and uses more sufficient information to improve the accuracy of allocation. The algorithm is verified on the synthetic and real-world datasets and compared with the current mainstream clustering algorithms. Ex-perimental results show that DPC-NLP is superior in performance and robustness and can deal with complex data such as manifold and nonlinear data.

Key words: clustering, density peaks clustering, label propagation, nearest neighbor

中图分类号: