Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (8): 933-944.DOI: 10.3778/j.issn.1673-9418.1403063

Previous Articles     Next Articles

Adaptive Entropy Algorithm for Projective Clustering

WU Tao, CHEN Lifei+   

  1. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China
  • Online:2014-08-01 Published:2014-08-07

自适应熵的投影聚类算法

吴  涛,陈黎飞+   

  1. 福建师范大学 数学与计算机科学学院,福州 350007

Abstract: Due to the curse of dimensionality, many traditional algorithms cannot effectively cluster high dimensional data. In recent years, projective clustering methods spark wide interest. Therein, soft subspace clustering methods have been widely studied and applied. However, most of existing algorithms often require the users to set some important parameters in advance, and ignore the optimization problems of the projected subspace, thus affecting the performance of clustering algorithms. To address the problems, this paper proposes a new objective function, which aims at both minimizing the within-cluster compactness and optimizing the projected subspace associated with each cluster. A new expression for feature-weight computation is mathematically derived, with which a new adaptive projective clustering algorithm is defined based on the framework of classical k-means. In the process of clustering, the optimal values of parameters are automatically calculated, relying on datasets and the formula derived. The experimental results show that the proposed algorithm significantly improves the clustering quality and outperforms the other existing projective clustering algorithms.

Key words: high-dimensional data, clustering, projected subspace, adaptability, feature weighting

摘要: 受“维度效应”的影响,许多传统聚类方法运用于高维数据时往往聚类效果不佳。近年来投影聚类方法获得广泛关注,其中软子空间聚类法更是得到了广泛的研究和应用。然而,现有的投影子空间聚类算法大多数均要求用户预先设置一些重要参数,且未能考虑簇类投影子空间的优化问题,从而降低了算法的聚类性能。为此,定义了一种新的优化目标函数,在最小化簇内紧凑度的同时,优化每个簇所在的子空间。通过数学推导得到了新的特征权重计算方法,并提出了一种自适应的k-均值型投影聚类算法。该算法在聚类过程中,依靠数据集自身的相关信息及推导获得的公式动态地计算各优化参数。实验结果表明,新算法通过对投影子空间的优化改善了聚类质量,其性能较已有投影聚类算法有了明显提升。

关键词: 高维数据, 聚类, 投影子空间, 自适应性, 特征权重