共享近邻紧密度的增量式谱聚类算法

doi:10.3778/j.issn.1673-9418.1901045

计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (6): 996-1004.DOI: 10.3778/j.issn.1673-9418.1901045

共享近邻紧密度的增量式谱聚类算法

赵萌萌，王士同

1. 江南大学数字媒体学院，江苏无锡 214122
2. 江南大学江苏省媒体设计与软件技术重点实验室，江苏无锡 214122

出版日期:2020-06-01 发布日期:2020-06-04

Incremental Spectral Clustering with Closeness of Shared Nearest Neighbors

ZHAO Mengmeng, WANG Shitong

1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
2. Key Laboratory of Media Design and Software Technology of Jiangsu Province, Jiangnan University, Wuxi, Jiangsu 214122, China

Online:2020-06-01 Published:2020-06-04

摘要/Abstract

摘要：

现有的基于共享近邻紧密度的谱聚类算法由于能很好地探索出数据点之间的潜在相似性关系，对未能完全分离的数据集具有健壮性，受到了越来越多的关注。但是，在运行时间和内存需求方面，它要花费的代价仍然十分昂贵，这使得其聚类处理能力不太高效，具有运行速度较慢，运行时间过长，面对大数据集时算法失效等缺点，因此该算法对于大规模数据集来说是不切实际的。为了克服这些缺点，提出了一种它的增量版本。该算法的思想是先将数据集分解为若干子集，然后以增量的方式在每个子集上运行，从而保证其具有良好的聚类性能。通过对人工数据集和仿真数据集进行大量的实验验证了该谱聚类算法的有效性。同时，该算法的时间消耗低，聚类精度高，且能够有效地对不断增加的数据集进行聚类。

关键词: 谱聚类, 共享最近邻, 增量式, 大规模数据

Abstract:

The existing spectral clustering algorithm based on the closeness of shared nearest neighbors has been attracting more and more attentions, since it indeed well explores the potential similarity relationship between data points and has strong robustness for data sets that are not completely separated. However, it is still costly in the sense of both running time and memory requirements, which makes its clustering processing ability not very efficient. It has the disadvantages of slow running speed, long running time and invalid algorithm when facing large data sets, and hence becomes impractical for a large-scale data. In order to overcome these drawbacks, its incre-mental version is proposed in this paper. The basic idea is to first decompose the entire data set into its several subsets, and then the proposed spectral clustering algorithm guarantees its promising clustering performance by running on each subset in an incremental way. A lot of experiments on artificial data sets and simulation data sets indicate the effectiveness of the proposed spectral clustering algorithm. At the same time, the algorithm has low time consumption, high clustering accuracy, and can effectively cluster the increasing data sets.

Key words: spectral clustering, shared nearest neighbor, incremental, large-scale data

赵萌萌，王士同. 共享近邻紧密度的增量式谱聚类算法[J]. 计算机科学与探索, 2020, 14(6): 996-1004.

ZHAO Mengmeng, WANG Shitong. Incremental Spectral Clustering with Closeness of Shared Nearest Neighbors[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(6): 996-1004.

[1]	柏锷湘，罗可，罗潇. 结合自然和共享最近邻的密度峰值聚类算法[J]. 计算机科学与探索, 2021, 15(5): 931-940.
[2]	曹杰，顾斌杰，熊伟丽，潘丰. 增量式约简最小二乘孪生支持向量回归机[J]. 计算机科学与探索, 2021, 15(3): 553-563.
[3]	薛红艳, 钱雪忠, 周世兵. 超簇加权的集成聚类算法[J]. 计算机科学与探索, 2021, 15(12): 2362-2373.
[4]	尤坊州，白亮. 关键节点选择的快速图聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1930-1937.
[5]	黄慧，李海林. 带有可信度标记的增量式数据修复方法研究[J]. 计算机科学与探索, 2021, 15(10): 1900-1911.
[6]	邵俊健，王士同. 具有抗噪性能适用高维数据的增量式聚类算法[J]. 计算机科学与探索, 2019, 13(9): 1553-1566.
[7]	矫培艳，张闯闯，王兴伟，黄敏. SDN控制域确定与划分机制[J]. 计算机科学与探索, 2019, 13(12): 2053-2060.
[8]	魏彩娜，钱鹏江，奚臣. 域间F-范数正则化迁移谱聚类方法[J]. 计算机科学与探索, 2018, 12(3): 472-483.
[9]	董琪，王士同. 隐子空间聚类算法的改进及其增量式算法[J]. 计算机科学与探索, 2017, 11(5): 802-813.
[10]	杨林青，李湛，牟雁超，樊里略，李红燕，王腾蛟，雷凯. 面向大规模数据集的并行化Top-k Skyline查询算法[J]. 计算机科学与探索, 2015, 9(8): 897-905.
[11]	王秀梅，韩冰，高新波，仇文亮，宋亚婷. 基于环形局部方向模式的弧状极光序列检测[J]. 计算机科学与探索, 2015, 9(5): 586-593.
[12]	尹宏伟，李凡长. 谱机器学习研究综述[J]. 计算机科学与探索, 2015, 9(12): 1409-1419.
[13]	张鲁飞，郝子宇，陈左宁. 基于矩阵计算的并行谱聚类方法[J]. 计算机科学与探索, 2015, 9(10): 1163-1171.
[14]	光俊叶，刘明霞，张道强. 基于有效距离的谱聚类算法[J]. 计算机科学与探索, 2014, 8(11): 1365-1372.
[15]	赵亮1+ , 陈荦1 , 景宁1 , 廖巍2 , 钟志农1 . 道路网中的移动对象连续范围查询*[J]. 计算机科学与探索, 2010, 4(7): 617-628.

共享近邻紧密度的增量式谱聚类算法

Incremental Spectral Clustering with Closeness of Shared Nearest Neighbors

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics