Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (5): 941-948.DOI: 10.3778/j.issn.1673-9418.1912012

• Artificial Intelligence • Previous Articles     Next Articles

Stable K Multiple-Means Clustering Algorithm

ZHANG Nini, GE Hongwei   

  1. 1. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2021-05-01 Published:2021-04-30

稳定的K-多均值聚类算法

张倪妮葛洪伟   

  1. 1. 江苏省模式识别与计算智能工程实验室(江南大学),江苏 无锡 214122
    2. 江南大学 物联网工程学院,江苏 无锡 214122

Abstract:

For improving the performance of K-means on the nonconvex cluster, a multiple-means clustering method with specified K clusters partitions the original data into multiple subclasses, and formalizes the multiple-means clustering problem as an optimization problem and achieves a better clustering result. To solve the problem of being sensitive to initial prototypes and unstable clustering results caused by random selection of initial prototypes, a stable K multiple-means clustering algorithm is proposed. The computation complexity and convergence analysis of the proposed algorithm are shown briefly in this paper. The algorithm constructs graph based on the first neighbor  relationship of data samples, divides data into several groups with connected branches of a graph, and takes the mean point of each group of data as the initial prototypes. Then the optimization problem is solved by alternating iteration method and the final clustering result is obtained. Experiments on artificial data sets and real data sets show that the proposed algorithm has a more stable and superior clustering effect.

Key words: clustering, multiple-means clustering method with specified K (KMM), prototypes initialization

摘要:

指定K个聚类的多均值聚类算法在K-均值算法的基础上设置了多个次类,以改善K-均值算法在非凸数据集上的劣势,并将多均值聚类问题形式化为优化问题,可以得到更优的聚类效果。但是该算法对初始原型敏感,且随机选取原型的方式使聚类结果不稳定。针对上述问题,提出一种稳定的K-多均值聚类算法,并对该算法的复杂度与收敛性进行了简要讨论。该算法先基于数据样本的最邻近关系构造图,根据图的连通分支将数据分为若干组,取每组数据的均值点作为初始原型,再用交替迭代的方法对优化问题进行求解,得到最后的聚类结果。在人工数据集和真实数据集上的实验表明,该算法具有更稳定更优越的聚类效果。

关键词: 聚类, K-多均值聚类(KMM), 原型初始化