计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (5): 732-741.DOI: 10.3778/j.issn.1673-9418.1509024

• 人工智能与模式识别 • 上一篇    下一篇

加权局部方差优化初始簇中心的K-means算法

蔡宇浩,梁永全,樊建聪+,李  璇,刘文华   

  1. 山东科技大学 信息科学与工程学院,山东 青岛 266590
  • 出版日期:2016-05-01 发布日期:2016-05-04

Optimizing Initial Cluster Centroids by Weighted Local Variance in K-means Algorithm

CAI Yuhao, LIANG Yongquan, FAN Jiancong+, LI Xuan, LIU Wenhua   

  1. College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China
  • Online:2016-05-01 Published:2016-05-04

摘要: 在传统K-means算法中,初始簇中心选择的随机性,导致聚类结果随不同的聚类中心而不同。因此出现了很多簇中心的选择方法,但是很多已有的簇中心选择算法,其聚类结果受参数调节的影响较大。针对这一问题,提出了一种新的初始簇中心选择算法,称为WLV-K-means(weighted local variance K-means)。该算法采用加权局部方差度量样本的密度,以更好地发现密度高的样本,并利用改进的最大最小法,启发式地选择簇初始中心点。在UCI数据集上的实验结果表明,WLV-K-means算法不仅能够取得较好的聚类结果,而且受参数变化的影响较小,有更加稳定的表现。

关键词: K-means算法, 方差, 加权, 最大最小法, 簇初始中心点

Abstract: The selection of initial cluster centroids in the classical K-means algorithm is random, which causes that the clustering results vary with different selections of cluster centroids. Thereby many selection approaches of initial centroids are devised and applied. However, most of them are affected by parameters design and parameter values. To overcome this problem, this paper proposes a novel initial cluster centroids selection algorithm, called WLV-K-means (weighted local variance K-means). The WLV-K-means algorithm employs the weighted local variance to measure the density of each sample, which can find samples with higher density. This algorithm also uses the improved max-min method to select cluster centroid heuristically. The experiments are made on UCI datasets and the results show that the WLV-K-means algorithm outperforms some improved K-means algorithms and is more stable and robust.

Key words: K-means algorithm, variance, weighting, max-min method, initial cluster centroids