Journal of Frontiers of Computer Science and Technology ›› 2023, Vol. 17 ›› Issue (2): 355-366.DOI: 10.3778/j.issn.1673-9418.2110025

• Theory·Algorithm • Previous Articles     Next Articles

Adaptive K-means Algorithm Combining Nearest-Neighbor Matrix and Local Density

Ailiminuer·Kuerban, XIE Juanying, YAO Ruoxia   

  1. School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
  • Online:2023-02-01 Published:2023-02-01

融合最近邻矩阵与局部密度的自适应K-means聚类算法

艾力米努尔·库尔班,谢娟英,姚若侠   

  1. 陕西师范大学 计算机科学学院,西安 710119

Abstract: To overcome the deficiencies of traditional K-means algorithms which are sensitive to the initial cluster centers and outliers, and their variants introducing densities, which need giving arbitrary parameters, this paper proposes an adaptive K-means clustering algorithm combining the nearest neighbor matrix and local density. Ins-pired by the nearest-neighbors and the density peaks, the adjacency matrix is constructed by introducing the distance difference between objects. Then the local density is calculated without any parameters except for the adjacency matrix. After that, the initial centers and the number of clusters of K-means are simultaneously determined by using the nearest-neighbor matrix and local density, and the rest objects are assigned as well. Experiments on synthetic datasets, and on real world datasets from UCI machine learning repository, and the comparisons with traditional K-means algorithm, and the improved K-means algorithms based on outliers or densities all demonstrate that the pro-posed adaptive K-means algorithm is robust to outliers on synthetic datasets, and obtains more accurate clustering results for real world datasets from UCI machine learning repository.

Key words: adaptive K-means clustering algorithm, density peak principle, nearest-neighbor principle, local density

摘要: 针对传统K-means聚类算法对初始聚类中心和离群孤立点敏感的缺陷,以及现有引入密度概念优化的K-means算法均需要设置密度参数或阈值的缺点,提出一种融合最近邻矩阵与局部密度的自适应K-means聚类算法。受最邻近吸收原则与密度峰值原则启发,通过引入数据对象间的距离差异值构造邻近矩阵,根据邻近矩阵计算局部密度,不需要任何参数设置,采取最近邻矩阵与局部密度融合策略,自适应确定初始聚类中心数目和位置,同时完成非中心点的初分配。人工数据集和UCI数据集的实验测试,以及与传统K-means算法、基于离群点改进的K-means算法、基于密度改进的K-means算法的实验比较表明,提出的自适应K-means算法对人工数据集的孤立点免疫度较高,对UCI数据集具有更准确的聚类结果。

关键词: 自适应K-means聚类算法, 密度峰值原则, 最邻近吸收原则, 局部密度