复杂分布数据的半监督阶段聚类

doi:10.3778/j.issn.1673-9418.1507102

计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (7): 1003-1009.DOI: 10.3778/j.issn.1673-9418.1507102

复杂分布数据的半监督阶段聚类

张俊溪1+，吴晓军2，蒋江红3

1. 西安航空学院车辆工程学院，西安 710077
2. 西北工业大学自动化学院，西安 710047
3. 陕西师范大学计算机科学学院，西安 710062

出版日期:2016-07-01 发布日期:2016-07-01

Semi-supervised Clustering Algorithm for Complex Distributed Data

ZHANG Junxi1+, WU Xiaojun2, JIANG Jianghong3

1. College of Vehicle Engineering, Xi’an Aeronautical University, Xi’an 710077, China
2. College of Automation, Northwestern Polytechnical University, Xi’an 710047, China
3. School of Computer Science, Shaanxi Normal University, Xi’an 710062, China

Online:2016-07-01 Published:2016-07-01

摘要/Abstract

摘要： 半监督聚类是一种用先验信息完善聚类过程的机器学习方法。通过将元胞自动机（cellular automata，CA）距离变换算法引入到半监督聚类过程中，采用平面距离变换算法将数据集划分为若干子类，获得聚类数和约束信息，并作为下一阶段聚类的先验信息。利用半监督K-means聚类算法对第一阶段的聚类结果做进一步划分，可以获得完整的聚类中心和聚类数，并由此提出CA-K-means二阶段聚类算法。采用3组人工数据集和3组标准UCI数据集进行对比仿真实验，将CA-K-means二阶段聚类算法与半监督K-means聚类算法、遗传K-means聚类算法和单纯的CA层次聚类算法进行对比，结果显示，该算法对复杂分布数据的聚类准确率较高，聚类性能更加优良。

关键词: 元胞自动机, 半监督聚类, K-means聚类算法, CA-K-means二阶段聚类, 复杂分布

Abstract: Semi-supervised clustering algorithm is a machine learning method which uses the priori information to improve the clustering process. Cellular automata (CA) distance transform algorithm is induced into the process of semi-supervised clustering. The dataset is divided into several clusters by distance transform of cellular automata, and then the number of clusters and the constraint information are obtained, which can be used as priori information of the next phase of clustering. In the second phase of clustering, the semi-supervised K-means clustering algorithm is used to further divide the results of the first phase and the final clustering results are got. Based on that, this paper proposes the CA-K-means clustering algorithm. By comparing the proposed algorithm with K-means algorithm, GA-K-means and pure CA clustering algorithm, the experimental results on three artificial data sets and three UCI data sets with different structures show that the novel algorithm has higher clustering accuracy for complex distributed data and more optimal clustering feature.

Key words: cellular automata, semi-supervised clustering algorithm, K-means clustering algorithm, CA-K-means two phases clustering algorithm, complex distribution

张俊溪，吴晓军，蒋江红. 复杂分布数据的半监督阶段聚类[J]. 计算机科学与探索, 2016, 10(7): 1003-1009.

ZHANG Junxi, WU Xiaojun, JIANG Jianghong. Semi-supervised Clustering Algorithm for Complex Distributed Data[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 1003-1009.

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	64

	来源	本网站

	次数	64
	比例	100%

摘要

179

最新录用	在线预览	正式出版

0	0	179

	来源	本网站

	次数	179
	比例	100%

[1]	李向利，张颖. 带核方法的判别图正则非负矩阵分解[J]. 计算机科学与探索, 2020, 14(11): 1899-1907.
[2]	方玲，陈松灿. 结合特征偏好的半监督聚类学习[J]. 计算机科学与探索, 2015, 9(1): 105-111.

复杂分布数据的半监督阶段聚类

Semi-supervised Clustering Algorithm for Complex Distributed Data

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐 0

Metrics