K-medoids Clustering Algorithms with Optimized Initial Seeds by Variance

doi:10.3778/j.issn.1673-9418.1409062

Abstract

Abstract: To overcome the deficiencies of fast K-medoids clustering algorithm of its computational load in computing the density of points and its initial seeds may locating in a same cluster, and to overcome the disadvantages of neighborhood-based K-medoids algorithm of its arbitrary in selecting a coefficient to adjust the radius of its neighborhood, this paper proposes two new variance based K-medoids clustering algorithms. These new algorithms respectively choose the mean distance between instances and the standard deviation of a specific instance as the radius of a neighborhood, and select the instances with minimum variance as initial seeds one by one where the distance between initial seeds is at least the radius of the neighborhood, so that the expected number of initial seeds have been got. This paper tests the proposed algorithms on the real datasets from UCI machine learning repository and on the synthetically generated datasets, and compares their performance in terms of many popular criteria for clustering. The experimental results demonstrate that the proposed new K-medoids clustering algorithms can obtain better clustering in short time, and they are scalable to cluster a comparable large scale dataset.

Key words: variance, standard deviation, neighborhood, initial seeds; K-medoids clustering

摘要： 针对快速K-medoids聚类算法存在密度计算复杂耗时和初始聚类中心可能位于同一类簇的缺陷，以及基于邻域的K-medoids算法的邻域半径需要人为给定一个调节系数的主观性缺陷，分别以样本间距离均值和相应样本的标准差为邻域半径，以方差作为样本分布密集程度的度量，选取方差值最小且其间距离不低于邻域半径的样本为K-medoids的初始聚类中心，提出了两种方差优化初始中心的K-medoids算法。在UCI数据集和人工模拟数据集上进行了实验测试，并对各种聚类指标进行了比较，结果表明该算法需要的聚类时间短，得到的聚类结果优，适用于较大规模数据集的聚类。

关键词: 方差, 标准差, 邻域, 初始聚类中心, K-medoids聚类

XIE Juanying, GAO Rui. K-medoids Clustering Algorithms with Optimized Initial Seeds by Variance[J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(8): 973-984.

谢娟英，高瑞. 方差优化初始中心的K-medoids聚类算法[J]. 计算机科学与探索, 2015, 9(8): 973-984.

[1]	WU Jiang, SONG Jingjing, CHENG Fuhao, WANG Pingxin, YANG Xibei. Research on Multi-granularity Attribute Reduction Method for Continuous Parameters [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1555-1562.
[2]	LI Changhua, CUI Liyang, LI Zhijie. Improved GCN Model for Inexact Graph Matching [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(8): 1397-1408.
[3]	CHEN Xingguo, XU Xiuying, CHEN Kangyang, YANG Guang. Surface Water Quality Classification via CMAES Ensemble Method [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 426-436.
[4]	DONG Jie, WANG Xun, ZHANG Wendong, WANG Pingxin, YANG Xibei. Research on Attribute Reduction Methods for Local Multiple Constraints [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(5): 875-883.
[5]	FANG Lichao, WANG Yu, YANG Xingli, LI Jihong. Variance-Regularized Classification Model Selection Criterion [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(3): 457-467.
[6]	GUO Yuhan, YI Peng. Distributed Hybrid Variable Neighborhood Search Algorithm for Carpooling Problem [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(2): 330-341.
[7]	RUAN Chuanyang, HAN Lina. Interval-Valued Hesitant Fuzzy Decision Making Method Considering Number of Interval-Valued Elements [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(9): 1513-1521.
[8]	WANG Jianfei, KANG Liangyi, LIU Jie, YE Dan. Distributed Stochastic Variance Reduction Gradient Descent Algorithm topkSVRG [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1047-1054.
[9]	WANG Haiyan, LIN Kezheng, Muhammad Rafique, LI Ao. Local Preserving Projection Based on Sample Column Information and Adaptive Neighborhood Graph [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(9): 1474-1483.
[10]	GUI Yan, LIU Yang, LI Feng. 3D Surfaces Texture Synthesis Based on Elements Distribution Construction [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(9): 1484-1495.
[11]	LU Mei, LI Fanzhang. Neighborhood-Embedded Tensor Learning [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7): 1102-1113.
[12]	WU Yu, YANG Aiping, ZHANG Huanji, WANG Jian, LIU Li. MEG Signals Classification Algorithm Based on Riemann and Bhattacharyya Distances [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 776-784.
[13]	HUANG Shan, GAO Xingbao. Improved Artificial Bee Colony Algorithm with Learning and Crisscross Search [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(12): 2004-2014.
[14]	WANG Peichong, MA Yue, GENG Mingyue, WANG Shenwen. New Teaching-Learning-Based Optimization with Neighborhood Structure Based on Small World [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(9): 1341-1350.
[15]	YANG Liu, WANG Yu. Analysis of Variance of F1 Measure Based on Blocked 3×2 Cross Validation [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(8): 1176-1183.

K-medoids Clustering Algorithms with Optimized Initial Seeds by Variance

方差优化初始中心的K-medoids聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics