Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensio¬nal Data Stream*

doi:10.3778/j.issn.1673-9418.2010.09.009

Journal of Frontiers of Computer Science and Technology ›› 2010, Vol. 4 ›› Issue (9): 859-864.DOI: 10.3778/j.issn.1673-9418.2010.09.009

• 学术研究 • Previous Articles

Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensio¬nal Data Stream*

REN Jiadong^1,2, ZHOU Weiwei¹⁺, HE Haitao¹

1. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China
2. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China

Received:1900-01-01 Revised:1900-01-01 Online:2010-09-09 Published:2010-09-09
Contact: ZHOU Weiwei

高维数据流的自适应子空间聚类算法

任家东^1,2, 周玮玮¹⁺, 何海涛¹

1. 燕山大学信息科学与工程学院, 河北秦皇岛 066004
2. 北京理工大学计算机科学技术学院, 北京 100081

通讯作者: 周玮玮

Abstract

Abstract: Clustering high-dimensional data streams is a research focused on the area of data mining. As the data stream is large volume, rapidly, high-dimensional, many clustering algorithms cannot achieve good clustering quali¬ty. This paper proposes a new adaptive clustering algorithm for mining subspace clusters in high-dimensional data stream, called SAStream. It improves the cluster structure in HPStream and defines the candidate clusters. The algorithm only computes the distance between the newly coming data points and the centroids of the candidate clusters instead of all clusters, so the number of examined clusters is reduced during clustering process. The created clusters are stored in pyramidal time frame and time fading function is used to discount the history of past behavior. When the data rate is fast, the LimitingRadius and cluster selection factor adjust automatically, and the clustering granularity adjusts all along. The experimental results show that the algorithm can group well with high speed.

Key words: high-dimensional data stream, subspace clustering, data rate, adaptive

摘要： 高维数据流聚类是数据挖掘领域中的研究热点。由于数据流具有数据量大、快速变化、高维性等特点, 许多聚类算法不能取得较好的聚类质量。提出了高维数据流的自适应子空间聚类算法SAStream。该算法改进了HPStream中的微簇结构并定义了候选簇, 只在相应的子空间内计算新来数据点到候选簇质心的距离, 减少了聚类时被检查微簇的数目, 将形成的微簇存储在金字塔时间框架中, 使用时间衰减函数删除过期的微簇; 当数据流量大时, 根据监测的系统资源使用情况自动调整界限半径和簇选择因子, 从而调节聚类的粒度。实验结果表明, 该算法具有良好的聚类质量和快速的数据处理能力。

关键词: 高维数据流, 子空间聚类, 数据流流量, 自适应

CLC Number:

TP301.6

REN Jiadong^1,2, ZHOU Weiwei¹⁺, HE Haitao¹. Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensio¬nal Data Stream*[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(9): 859-864.

任家东1,2 , 周玮玮1+ , 何海涛1 . 高维数据流的自适应子空间聚类算法[J]. 计算机科学与探索, 2010, 4(9): 859-864.

[1]	FAN Ruidong, HOU Chenping. Robust Auto-weighted Multi-view Subspace Clustering [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1062-1073.
[2]	MAO Qinghua, ZHANG Qiang. Improved Sparrow Algorithm Combining Cauchy Mutation and Opposition-Based Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1155-1164.
[3]	ZHANG Wei, DENG Zhaohong, WANG Shitong. Kernel-Induced Incomplete Multi-view Clustering [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 284-293.
[4]	ZHANG Pei, ZHU En, CAI Zhiping. One-Stage Partition-Fusion Multi-view Subspace Clustering Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2413-2420.
[5]	FAN Hong, SHI Xiaomin, YAO Ruoxia. Soft Subspace Clustering Algorithm Optimized by Brain Storm Algorithm for Breast MR Image [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(8): 1348-1357.
[6]	ZHANG Dehui, YOU Xiaoming, LIU Sheng. Dynamic Grouping Ant Colony Algorithm Combined with Cat Swarm Optimization [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(5): 880-891.
[7]	ZHAO Hui, JING Liping, YU Jian. Pose-Robust Face Alignment with Adaptive Supervised Descent Method [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(4): 649-656.
[8]	LI Chao, MEN Changqian, WANG Wenjian. PAC Optimal Exploration Algorithm Named RMAX-KNN [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 513-526.
[9]	DU Shishuai, QIU Tian, LI Lingqiao, HU Jinquan, ZHENG Anbing, FENG Yanchun, HU Changqin, YANG Huihua. Application of Multi-Layered Gradient Boosting Decision Trees in Pharmaceutical Classification [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(2): 260-273.
[10]	LUO Yangxia, MA Di, CHANG Yanshuo. Spectral Multi-Manifold Clustering Based on PID Parameter Adjustment [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(8): 1360-1369.
[11]	LI Xingxing, LIU Huafeng, JING Liping. Mixture Rank Matrix Factorization Model [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(7): 1114-1122.
[12]	YANG Jinlong, TANG Yu, ZHANG Guangnan. Visual Multi-Object Tracking Using Convolution Feature and Multi-Bernoulli Filter [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(11): 1945-1957.
[13]	WANG Bing, PENG Qiang, CHEN Jian. Spatio-Temporal Adaptive Error Concealment Algorithm Based on Block Division [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(1): 128-137.
[14]	ZHANG Jingmao, SHEN Yanxia. Multiscale Single Feature Spectral Segmentation [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(3): 442-451.
[15]	HU Qiankun, DING Shifei. p-Spectral Clustering Algorithm with Optimization of Local Similarity [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(3): 462-471.

Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensio¬nal Data Stream*

高维数据流的自适应子空间聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics