计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (4): 642-652.DOI: 10.3778/j.issn.1673-9418.1706059

• 人工智能与模式识别 • 上一篇    下一篇

图松弛优化聚类的快速近似提升方法

谢    磊+,王士同   

  1. 江南大学 数字媒体学院,江苏 无锡 214122
  • 出版日期:2018-04-01 发布日期:2018-04-04

Fast Approximate Approaches for Graph-Based Relaxed Optimization Clustering

XIE Lei+, WANG Shitong   

  1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2018-04-01 Published:2018-04-04

摘要: 基于图松弛优化为非近似迭代方法提供了有效的分析解决方案,且实现简单。然而,由于矩阵的逆在计算时需要多项式时间,则在运行速度方面不是很理想,当面对较大规模数据时此方法将变得不可行。提出了对基于图松弛优化聚类进行快速近似提升的两种方法:一个是基于k均值聚类,另一个是基于随机投影树。广泛实验表明,这些算法在运算速度方面表现较优,聚类精度变化非常小。具体来讲,该算法在运算大规模数据时精度优于k均值算法,并且在保证精度的情况下运行速度远快于基于图松弛优化聚类算法。值得注意的是,该算法可以使得单个机器在数分钟内对具有数百万样本的数据集进行聚类。

关键词: 无监督学习, 基于图松弛优化聚类, 数据量化, 高维数据, 快速近似

Abstract: Due to its easy implementation, the graph-based relaxed optimization indeed provides an effective analytical solution for non-approximation iterative methods. However, due to the inverse of the matrix, such an optimization will run slowly and even become impractical for large-scale data. This paper develops two general approaches for fast graph-based relaxed optimization clustering. One is based on k-means clustering, and the other is based on random projection tree. Extensive experiments show that these two proposed approaches can achieve significant acceleration without degrading the clustering accuracy a lot. In particular, the approaches have better clustering performance than the classical k-mean algorithm on large-scale data, and run faster than the graph-based relaxed optimization clustering algorithms, with comparable accuracy. It is worth noting that the proposed approaches in this paper allow a single machine to cluster millions of data samples within minutes.

Key words: unsupervised learning, graph-based clustering, data quantization, high dimensional data, fast approximate