计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (11): 1019-1026.DOI: 10.3778/j.issn.1673-9418.2010.11.007

• 学术研究 • 上一篇    下一篇

k-means型软子空间聚类算法*

张燕萍, 姜青山+   

  1. 厦门大学 软件学院, 福建 厦门 361005
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-11-01 发布日期:2010-11-01
  • 通讯作者: 姜青山

A k-means-based Algorithm for Soft Subspace Clustering*

ZHANG Yanping, JIANG Qingshan+   

  1. School of Software, Xiamen University, Xiamen, Fujian 361005, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-11-01 Published:2010-11-01
  • Contact: JIANG Qingshan

摘要: 软子空间聚类是聚类研究领域的一个重要分支和研究热点。高维空间聚类以数据分布稀疏和“维度效应”现象等问题而成为难点。在分析现有软子空间聚类算法不足的基础上, 引入子空间差异的概念; 在此基础上, 结合簇内紧凑度的信息来设计新的目标优化函数; 提出了一种新的k-means型软子空间聚类算法, 该算法在聚类过程中无需设置额外的参数。理论分析与实验结果表明, 相对于其他的软子空间算法, 该算法具有更好的聚类精度。

关键词: 高维数据, k均值, 软子空间算法, 子空间差异

Abstract: Soft subspace clustering is an important part and research hotspot in clustering research. Clustering in high dimensional space is especially difficult due to the sparse distribution of the data and the curse of dimensionality. By analyzing limitations of the existing algorithms, the concept of subspace difference is proposed. Based on these, a new objective function is given by taking into account the compactness of the subspace clusters and subspace difference of the clusters. And a subspace clustering algorithm based on k-means is presented. The additional parameter is not necessary in the novel algorithm. Theoretical analysis and experimental results demonstrate that the proposed algorithm significantly improves the accuracy.

Key words: high dimensional data, k-means, subspace clustering, subspace difference

中图分类号: