稳定的K-多均值聚类算法

doi:10.3778/j.issn.1673-9418.1912012

摘要/Abstract

摘要：

指定K个聚类的多均值聚类算法在K-均值算法的基础上设置了多个次类，以改善K-均值算法在非凸数据集上的劣势，并将多均值聚类问题形式化为优化问题，可以得到更优的聚类效果。但是该算法对初始原型敏感，且随机选取原型的方式使聚类结果不稳定。针对上述问题，提出一种稳定的K-多均值聚类算法，并对该算法的复杂度与收敛性进行了简要讨论。该算法先基于数据样本的最邻近关系构造图，根据图的连通分支将数据分为若干组，取每组数据的均值点作为初始原型，再用交替迭代的方法对优化问题进行求解，得到最后的聚类结果。在人工数据集和真实数据集上的实验表明，该算法具有更稳定更优越的聚类效果。

关键词: 聚类, K-多均值聚类（KMM）, 原型初始化

Abstract:

For improving the performance of K-means on the nonconvex cluster, a multiple-means clustering method with specified K clusters partitions the original data into multiple subclasses, and formalizes the multiple-means clustering problem as an optimization problem and achieves a better clustering result. To solve the problem of being sensitive to initial prototypes and unstable clustering results caused by random selection of initial prototypes, a stable K multiple-means clustering algorithm is proposed. The computation complexity and convergence analysis of the proposed algorithm are shown briefly in this paper. The algorithm constructs graph based on the first neighbor relationship of data samples, divides data into several groups with connected branches of a graph, and takes the mean point of each group of data as the initial prototypes. Then the optimization problem is solved by alternating iteration method and the final clustering result is obtained. Experiments on artificial data sets and real data sets show that the proposed algorithm has a more stable and superior clustering effect.

Key words: clustering, multiple-means clustering method with specified K (KMM), prototypes initialization

张倪妮, 葛洪伟. 稳定的K-多均值聚类算法[J]. 计算机科学与探索, 2021, 15(5): 941-948.

ZHANG Nini, GE Hongwei. Stable K Multiple-Means Clustering Algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(5): 941-948.

参考文献

[1] MACQUEEN J B. Some methods for classification and ana-lysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Pro-bability, Berkeley, 1967. Berkeley: University of California Press, 1967: 281-297.
[2] YU X F， GE H W. Potential clustering by automatic deter-mination of cluster centers[J]. Journal of Frontiers of Com-puter Science and Technology, 2018, 12(6)：1004-1012.
于晓飞, 葛洪伟. 自动确定聚类中心的势能聚类算法[J]. 计算机科学与探索, 2018, 12(6): 1004-1012.
[3] JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
[4] SHI J B, MALIK J. Normalized cuts and image segmenta-tion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
[5] VON LUXBURG U. A tutorial on spectral clustering[J]. Sta-tistics and Computing, 2007, 17(4): 395-416.
[6] ARTHUR D, VASSILVITSKII S. K-means++：the advan-tages of careful seeding[C]//Proceedings of the 18th Annual ACMSIAM Symposium on Discrete Algorithms, New Or-leans, Jan 7-9, 2007. Philadelphia：SIAM, 2007: 1027-1035.
[7] BACHEM O, LUCIC M, HASSANI S H, et al. Fast and provably good seedings for k-means[C]//Proceedings of the Annual Conference on Neural Information Processing Sys-tems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Asso-ciates, 2016：55-63.
[8] HONG M, JIA C Y, WANG X Y. Research on initialization of K-means type multi-view clustering[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(4): 574-585.
洪敏, 贾彩燕, 王晓阳. K-means型多视图聚类中的初始化问题研究[J]. 计算机科学与探索, 2019, 13(4): 574-585.
[9] DHILLON I, GUAN Y Q, KULIS B. Kernel k-means, spe-ctral clustering and normalized cuts[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004. New York: ACM, 2004: 551-556.
[10] FILIPPONE M, CAMASTRA F, MASULLI F, et al. A survey of kernel and spectral methods for clustering[J]. Pattern Recognition, 2008, 41(1): 176-190.
[11] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C]//Proceedings of the 14th Inter-national Conference on Neural Information Processing Sys-tems: Natural and Synthetic, Vancouver，Dec 3-8, 2001. Cam-bridge: MIT Press, 2001: 849-856.
[12] NIE F, WANG X, HUANG H. Clustering and projected clustering with adaptive neighbors[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 977-986.
[13] ZHA H, HE X, DING C, et al. Spectral relaxation for k-means clustering[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems: Na-tural and Synthetic, Vancouver, Dec 3-8, 2001. Cambridge: MIT Press, 2001: 1057-1064.
[14] GUHA S, RASTOGI R, SHIM K. CURE: an efficient clus-tering algorithm for large databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Manage-ment of Data, Seattle, Jun 2-4, 1998. New York: ACM, 1998: 73-84.
[15] LIU M, JIANG X, KOT A C. A multi-prototype clustering algorithm[J]. Pattern Recognition, 2009, 42(5): 689-698.
[16] LUO T, ZHONG C, LI H, et al. A multi-prototype clustering algorithm based on minimum spanning tree[C]//Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery, Yantai, Aug 10-12, 2010. Piscataway: IEEE, 2010: 1602-1607.
[17] TAO C W. Unsupervised fuzzy clustering with multi-center clusters[J]. Fuzzy Sets and Systems, 2002, 128(3): 305-322.
[18] WANG C D, LAI J H, SUEN C Y, et al. Multi-exemplar affinity propagation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(9): 2223-2237.
[19] WANG Y, CHEN L. K-MEAP: multiple exemplars affinity propagation with specified K clusters[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(12): 2670-2682.
[20] NIE F, WANG C L, LI X. K-Multiple-Means: a multiple-means clustering method with specified K clusters[C]//Pro-ceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Alaska, Aug 4-8, 2019. New York: ACM, 2019: 959-967.
[21] CHUNG F R K, GRAHAM F C. Spectral graph theory[M]. Providence: American Mathematical Society, 1997.
[22] FAN K. On a theorem of Weyl concerning eigenvalues of linear transformations I[J]. Proceedings of the National Aca-demy of Sciences of the United States of America, 1949, 35(11): 652-655.
[23] NIE F, WANG X, DENG C, et al. Learning a structured optimal bipartite graph for co-clustering[C]//Proceedings of the Advances in Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 4129-4138.
[24] WU C F J. On the convergence properties of the EM algorithm[J]. The Annals of Statistics, 1983, 11(1): 95-103.
[25] LLOYD S P. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129-137.
[26] GIROLAMI M. Mercer kernel-based clustering in feature space[J]. IEEE Transactions on Neural Networks, 2002, 13(3): 780-784.
[27] LICHMAN M. UCI machine learning repository[EB/OL].[2019-08-20]. http://archive.ics.uci.edu/ml.

编辑推荐 0

Metrics

阅读次数

全文

215

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	215

来源	本网站	其他网站

次数	173	42
比例	80%	20%

摘要

382

最新录用	在线预览	正式出版

0	0	382

	来源	本网站

	次数	382
	比例	100%