计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (12): 2880-2895.DOI: 10.3778/j.issn.1673-9418.2208003

• 理论·算法 • 上一篇    下一篇

自适应阈值约束的密度簇主干聚类算法

张锦宏,陈梅,张弛   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 出版日期:2023-12-01 发布日期:2023-12-01

Density Backbone Clustering Algorithm Based on Adaptive Threshold

ZHANG Jinhong, CHEN Mei, ZHANG Chi   

  1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 针对现有聚类算法识别任意簇时精度不足、对簇内数据点密度变化敏感、对异常点敏感以及阈值取值难以确定等问题,提出了自适应阈值约束的密度簇主干聚类算法(DCBAT)。该算法首先结合偏度系数和数据点密度均值定义了数据点密度可达自适应阈值,在该阈值的约束下将具有较高局部密度和较高相对距离的核心点按密度可达性分组,进而得到初始簇主干。接着将非核心数据点归并到其密度较大的最近邻所在簇中,得到初始簇。最后结合簇内密度差均值和比例系数定义了密度差自适应阈值,在该阈值的约束下于簇内点密度变化剧烈处拆分初始簇,得到最终簇。DCBAT在充分考虑数据分布特点和内部结构特点的情况下进行聚类,从而提高了聚类性能。与五个优秀算法k-means、DBSCAN、OPTICS、CFDP和MulSim在八个不同维度、不同类型的数据集上的实验结果表明,DCBAT算法具有识别任意簇效果佳、对簇内点密度变化不敏感、对异常点不敏感、聚类结果精确且稳定等特点,综合性能优于对比算法。

关键词: 聚类, 簇主干, 密度可达自适应阈值, 密度差自适应阈值, 任意簇

Abstract: The existing clustering algorithms are inaccurate to identify arbitrary clusters, sensitive to density changes within clusters, sensitive to outliers and difficult to determine the threshold. An adaptive threshold-constrained density cluster backbone clustering algorithm (DCBAT) is proposed to solve the problems. Firstly, the adaptive reachability density threshold is defined in combination with the skewness coefficient and points density mean. Under the constraint of the threshold, the core points with higher local densities and higher relative distances are grouped according to the reachability, and the initial clusters backbones are obtained. The non-core points are then assigned into the cluster which their nearest neighbors with higher density belong to. Finally, the adaptive density D-value threshold is proposed in combination with D-value mean and scale factor. According to the threshold, the initial cluster is separated at the point where the density varies sharply, and the final clusters are obtained. DCBAT fully considers the internal structure and distribution of the data when clustering, thereby improving the clustering performance. The performance of this algorithm is demonstrated compared with five excellent algorithms k-means,DBSCAN, OPTICS, CFDP and MulSim on eight datasets with various dimensions and types. DCBAT algorithm has the advantages of good recognition of arbitrary clusters, insensitivity to density changes within clusters, insensitivity to outliers and stable clustering result. Its overall performance is superior to comparison algorithms.

Key words: clustering, cluster backbone, reachability adaptive threshold of density, adaptive threshold of density D-value, arbitrary clusters