密度峰值优化初始中心的K-medoids聚类算法

doi:10.3778/j.issn.1673-9418.1506072

摘要/Abstract

摘要： 针对快速K-medoids聚类算法和方差优化初始中心的K-medoids聚类算法存在需要人为给定类簇数，初始聚类中心可能位于同一类簇，或无法完全确定数据集初始类簇中心等缺陷，受密度峰值聚类算法启发，提出了两种自适应确定类簇数的K-medoids算法。算法采用样本x_i的t最近邻距离之和倒数度量其局部密度ρ_i，并定义样本x_i的新距离δ_i，构造样本距离相对于样本密度的决策图。局部密度较高且相距较远的样本位于决策图的右上角区域，且远离数据集的大部分样本。选择这些样本作为初始聚类中心，使得初始聚类中心位于不同类簇，并自动得到数据集类簇数。为进一步优化聚类结果，提出采用类内距离与类间距离之比作为聚类准则函数。在UCI数据集和人工模拟数据集上进行了实验测试，并对初始聚类中心、迭代次数、聚类时间、Rand指数、Jaccard系数、Adjusted Rand index和聚类准确率等经典聚类有效性评价指标进行了比较，结果表明提出的K-medoids算法能有效识别数据集的真实类簇数和合理初始类簇中心，减少聚类迭代次数，缩短聚类时间，提高聚类准确率，并对噪音数据具有很好的鲁棒性。

关键词: 聚类, K-medoids算法, 初始聚类中心, 密度峰值, 准则函数

Abstract: To overcome the deficiencies of the fast K-medoids and the variance based K-medoids clustering algorithms whose number of clusters of a dataset must be provided manually and their initial seeds may locate in a same cluster or cannot be totally found etc. Stimulated by the density peak clustering algorithm, this paper proposes two new K-medoids clustering algorithms. The new algorithms define the local density ρ_iof point x_i as the reciprocal of the sum of the distance betweenxiand its t nearest neighbors, and new distance δ_i of point x_iis defined as well, then the decision graph of a point distance relative to its local density is plotted. The points with higher local density and apart from each other located at the upper right corner of the decision graph, which are far away from the rest points in the same dataset, are chosen as the initial seeds for K-medoids, so that the seeds will be in different clusters and the number of clusters of the dataset is automatically determined as the number of initial seeds. In order to get a better clustering, a new measure function is proposed as the ratio of the intra-distance of clusters to the inter-distance between clusters. The proposed two new K-medoids algorithms are tested on the real datasets from UCI machine learning repository and on the synthetic datasets. The clustering results of the proposed algorithms are evaluated in terms of the initial seeds selected, iterations, clustering time, Rand index, Jaccard coefficient, Adjusted Rand index and the clustering accuracy. The experimental results demonstrate that the proposed new K-medoids clustering algorithms can recognize the number of clusters of a dataset, find its proper initial seeds, reduce the clustering iterations and the clustering time, improve the clustering accuracy, and are robust to noises as well.

Key words: clustering, K-medoids algorithm, initial seeds, density peak, measure function

谢娟英，屈亚楠. 密度峰值优化初始中心的K-medoids聚类算法[J]. 计算机科学与探索, 2016, 10(2): 230-247.

XIE Juanying, QU Yanan. K-medoids Clustering Algorithms with Optimized Initial Seeds by Density Peaks[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(2): 230-247.

[1]	陈俊芬, 张明, 赵佳成, 谢博鋆, 李艳. 结合降噪和自注意力的深度聚类算法[J]. 计算机科学与探索, 2021, 15(9): 1717-1727.
[2]	王大刚, 丁世飞, 钟锦. 基于二阶[k]近邻的密度峰值聚类算法研究[J]. 计算机科学与探索, 2021, 15(8): 1490-1500.
[3]	沈学利, 秦鑫宇. 密度Canopy的增强聚类与深度特征的KNN算法[J]. 计算机科学与探索, 2021, 15(7): 1289-1301.
[4]	范瑞东, 侯臣平. 鲁棒自加权的多视图子空间聚类[J]. 计算机科学与探索, 2021, 15(6): 1062-1073.
[5]	柏锷湘, 罗可, 罗潇. 结合自然和共享最近邻的密度峰值聚类算法[J]. 计算机科学与探索, 2021, 15(5): 931-940.
[6]	张倪妮, 葛洪伟. 稳定的K-多均值聚类算法[J]. 计算机科学与探索, 2021, 15(5): 941-948.
[7]	马瑞强, 宋宝燕, 丁琳琳, 王俊陆. 面向时间序列事件的动态矩阵聚类方法[J]. 计算机科学与探索, 2021, 15(3): 468-477.
[8]	薛红艳, 钱雪忠, 周世兵. 超簇加权的集成聚类算法[J]. 计算机科学与探索, 2021, 15(12): 2362-2373.
[9]	张培, 祝恩, 蔡志平. 单步划分融合多视图子空间聚类算法[J]. 计算机科学与探索, 2021, 15(12): 2413-2420.
[10]	姚晓红, 黄恒君. 非负半监督函数型聚类方法[J]. 计算机科学与探索, 2021, 15(12): 2438-2448.
[11]	刘娟, 万静. 自然反向最近邻优化的密度峰值聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1888-1899.
[12]	尤坊州, 白亮. 关键节点选择的快速图聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1930-1937.
[13]	黄宇翔, 黄栋, 王昌栋, 赖剑煌. 基于集成学习的改进深度嵌入聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1949-1957.
[14]	屈晶晶, 蔡英, 范艳芳, 夏红科. 基于k-prototype聚类的差分隐私混合数据发布算法[J]. 计算机科学与探索, 2021, 15(1): 109-118.
[15]	范虹，史肖敏，姚若侠. 头脑风暴算法优化的乳腺MR图像软子空间聚类算法[J]. 计算机科学与探索, 2020, 14(8): 1348-1357.

密度峰值优化初始中心的K-medoids聚类算法

K-medoids Clustering Algorithms with Optimized Initial Seeds by Density Peaks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics