计算机科学与探索

• 学术研究 •    

融合相对密度和最近邻关系的密度峰值聚类

王威娜,朱钰,任艳   

  1. 1.吉林化工学院 信息与控制工程学院,吉林 吉林 132022
    2.沈阳航空航天大学 人工智能学院, 沈阳 110136

Density Peaks Clustering Combining Relative Local Density and Nearest Neighbor Re-lationship

WANG Weina, ZHU Yu, REN Yan   

  1. 1.College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, Jilin 13200, China
    2.College of Automation, Shenyang Aerospace University, Shenyang, 110136, China

摘要: 密度峰值算法在处理密度不均匀的数据时对中心点的选取不准确,并在样本分配时易产生连带错误,导致聚类效果不佳。针对上述问题,提出一种融合相对局部密度和最近邻关系的密度峰值聚类算法。在局部密度的定义中引入稀疏平和权重,提出相对局部密度的定义,根据相对局部密度寻找密度峰值,避免稀疏差异较大的数据集在选取密度峰值时出现的错误,确保中心点选择的正确性;针对分配策略,结合最邻近点准则和阈值限制,提出最近邻分配策略,根据阈值条件有效抑制分配连带错误;基于类内距离均值定义距离比例,提出修正分配策略,提升算法对边界点聚类的准确性。在5个合成数据集和5个UCI数据集上,将提出算法与DPC、DPC-MND、FKNN-DPC、DBSCAN、OPTICS、AP、K-Means算法进行比较,实验结果表明所提算法在调整互信息、调整兰德系数和Fowlkes-Mallows指数上均表现出良好的聚类效果,并通过Friedman检验表明该算法具有最优的性能。

关键词: 聚类算法, 密度峰值, 相对局部密度, 最近邻关系, 分配策略

Abstract: When the density peaks algorithm deals with datasets with different densities, the wrong center points may be selected, and the problem of associated errors may occur in the sample allocation process. To solve the above problems, a density peaks clustering algorithm based on the relative local density and nearest neighbor relationship is proposed. The relative local density is proposed by using the weights of sparse balance to avoid the error of selecting density peaks and to ensure the accuracy of center point selection. The nearest neighbor allocation strategy is proposed by combining with the nearest neighbor criterion and threshold limit to suppress the allocation error effectively. The modified allocation strategy based on the mean value of the distance within the class is proposed to enhance the accuracy of the algorithm for boundary point clustering. In the experimental part, the proposed algorithm is compared with DPC, DPC-MND, FKNN-DPC, DBSCAN, OPTICS, AP, and K-means algorithms on 5 syntheticdatasets and 5 UCI datasets, and the experimetal results demonstrate that the proposed algorithm has sound clustering performance in metrics of adjusted mutual information, adjusted rand index, and fowlkes mallows index. Friedman test shows that the algorithm has the best performance.

Key words: clustering algorithm, density peaks, relative local density, nearest neighbor relations, allocation strategy