计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (5): 611-620.DOI: 10.3778/j.issn.1673-9418.1409053

• 人工智能与模式识别 • 上一篇    下一篇

粒计算优化初始聚类中心的K-medoids聚类算法

谢娟英+,鲁肖肖,屈亚楠,高红超   

  1. 陕西师范大学 计算机科学学院,西安 710062
  • 出版日期:2015-05-01 发布日期:2015-05-06

K-medoids Clustering Algorithms with Optimized Initial Seeds by Granular Computing

XIE Juanying+, LU Xiaoxiao, QU Yanan, GAO Hongchao   

  1. School of Computer Science, Shaanxi Normal University, Xi’an 710062, China
  • Online:2015-05-01 Published:2015-05-06

摘要: 针对快速K-medoids聚类算法所选初始聚类中心可能位于同一类簇的缺陷,以及基于粒计算的K-medoids算法构造样本去模糊相似矩阵时需要主观给定阈值的缺陷,提出了粒计算优化初始聚类中心的K-medoids聚类算法。该算法结合粒计算与最大最小距离法,优化K-medoids算法初始聚类中心的选取,选择处于样本分布密集区域且相距较远的K个样本作为初始聚类中心;使用所有样本的相似度均值作为其构造去模糊相似矩阵的阈值。人工模拟数据集和UCI机器学习数据库数据集的实验测试表明,新K-medoids聚类算法具有更稳定的聚类效果,其准确率和Adjusted Rand Index等聚类结果评价指标值优于传统K-medoids聚类算法、快速K-medoids聚类算法和基于粒计算的K-medoids聚类算法。

关键词: 粒计算, 初始聚类中心, 最大最小距离法, K-medoids聚类算法

Abstract: To overcome the defects of fast K-medoids clustering algorithm which may choose the initial seeds in a same cluster for different clusters and the arbitrary of granular computing based K-medoids clustering algorithm in determining the threshold to construct the defuzzy similarity matrix, this paper proposes two new K-medoids clustering algorithms with optimized initial seeds by granular computing. This proposed algorithms combine granular computing with max-min distance means to choose the optimal initial seeds, so that the K instances in dense area and apart from each other are selected as initial seeds, and adopt the mean similarity between instances as the threshold to construct the defuzzy similarity matrix. This paper tests the proposed algorithms on the synthetically generated datasets and the datasets from UCI machine learning repository. The experimental results evaluated in terms of clustering accuracy and Adjusted Rand Index etc. demonstrate that the proposed K-medoids algorithms are superior to the traditional K-medoids algorithm, the fast K-medoids algorithm and the previous K-medoids clustering algorithm based on granular computing.

Key words: granular computing, initial seeds, max-min distance means, K-medoids clustering algorithm