计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (8): 1879-1892.DOI: 10.3778/j.issn.1673-9418.2205032

• 理论·算法 • 上一篇    下一篇

融合相对密度和最近邻关系的密度峰值聚类

王威娜,朱钰,任艳   

  1. 1. 吉林化工学院 信息与控制工程学院,吉林 132022
    2. 沈阳航空航天大学 人工智能学院,沈阳 110136
  • 出版日期:2023-08-01 发布日期:2023-08-01

Density Peaks Clustering Combining Relative Local Density and Nearest Neighbor Relationship

WANG Weina, ZHU Yu, REN Yan   

  1. 1. College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 132022, China
    2. College of Artificial Intelligence, Shenyang Aerospace University, Shenyang 110136, China
  • Online:2023-08-01 Published:2023-08-01

摘要: 密度峰值算法在处理密度不均匀的数据时对中心点的选取不准确,并在样本分配时易产生连带错误,导致聚类效果不佳。针对上述问题,提出一种融合相对局部密度和最近邻关系的密度峰值聚类算法。在局部密度的定义中引入稀疏平和权重,提出相对局部密度的定义,根据相对局部密度寻找密度峰值,避免稀疏差异较大的数据集在选取密度峰值时出现的错误,确保中心点选择的正确性;针对分配策略,结合最邻近点准则和阈值限制,提出最近邻分配策略,根据阈值条件有效抑制分配连带错误;基于类内距离均值定义距离比例,提出修正分配策略,提升算法对边界点聚类的准确性。在5个合成数据集和5个UCI数据集上,将提出算法与DPC、DPC-MND、FKNN-DPC、DBSCAN、OPTICS、AP、K-means算法进行比较,实验结果表明,所提算法在调整互信息、调整兰德系数和Fowlkes-Mallows指数上均表现出良好的聚类效果,并通过Friedman检验表明该算法具有最优的性能。

关键词: 聚类算法, 密度峰值, 相对局部密度, 最近邻关系, 分配策略

Abstract: When the density peaks algorithm deals with datasets with different densities, the wrong center points may be selected, and the problem of associated errors may occur in the sample allocation process. To solve the above problems, a density peaks clustering algorithm based on the relative local density and nearest neighbor relationship is proposed. The weights of sparse balance are introduced into the definition of local density, and the definition of relative local density is proposed. The density peak can be found according to the relative local density, which avoids the error of selecting the density peak in the dataset with large sparse differences, and ensures the accuracy of the center point selection. The nearest neighbor allocation strategy is proposed by combining the nearest neighbor criterion and threshold limit to suppress the allocation error effectively. The modified allocation strategy based on the mean value of the distance within the class is proposed to enhance the accuracy of the algorithm for boundary point clustering. The proposed algorithm is compared with DPC, DPC-MND, FKNN-DPC, DBSCAN, OPTICS, AP, and K-means algorithms on 5 synthetic datasets and 5 UCI datasets, and the experimental results demonstrate that the proposed algorithm has sound clustering performance in metrics of adjusted mutual information, adjusted Rand index, and Fowlkes-Mallows index. Friedman test shows that the algorithm has the best performance.

Key words: clustering algorithm, density peaks, relative local density, nearest neighbor relations, allocation strategy