计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (3): 659-673.DOI: 10.3778/j.issn.1673-9418.2212041

• 理论·算法 • 上一篇    下一篇

平滑非负低秩图表示聚类算法

钱罗雄,陈梅,张弛,张锦宏,马学艳   

  1. 兰州交通大学 电子与信息工程学院,兰州 730070
  • 出版日期:2024-03-01 发布日期:2024-03-01

Smooth Non-negative Low-Rank Graph Representation for Clustering

QIAN Luoxiong, CHEN Mei, ZHANG Chi, ZHANG Jinhong, MA Xueyan   

  1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Online:2024-03-01 Published:2024-03-01

摘要: 针对现有低秩图表示算法在构建表示图时未能精确捕获数据的全局表示结构、未能充分利用数据有效信息指导表示图的构建以及构建的表示图不具有适于聚类的连通结构等问题,提出了平滑非负低秩图表示聚类算法(SNLRR)。SNLRR采用一种更符合矩阵秩特性的对数行列式函数代替核范数平滑地估计秩,有效降低矩阵较大奇异值对秩估计的影响,平衡了所有奇异值对秩估计的贡献比重,增强秩估计的准确性,从而更精准地捕获数据的全局表示结构。为了更加准确地捕获数据局部表示结构,SNLRR引入距离正则项为每个数据点自适应地分配最优近邻学习表示矩阵。此外,SNLRR对表示矩阵的拉普拉斯矩阵施加秩约束,使最终学习到的表示图具有与簇个数相同数量的连通分量,即表示图具有适于聚类的连通结构。与八个对比算法在七个高维且分布复杂的数据集上的实验结果显示,SNLRR算法的聚类性能均优于八种对比算法,Accuracy平均提高了0.207 3,NMI平均提高了0.175 8。因此,SNLRR是一个能够有效处理维度高且分布复杂数据的图表示聚类算法。

关键词: 聚类, 低秩表示, 秩约束, 对数行列式低秩

Abstract: The existing low-rank graph representation algorithms fail to capture the global representation structure of data accurately, and cannot make full use of the valid information of data to guide the construction of the representation graph, then the constructed representation graph does not have a connected structure suitable for clustering. A smooth non-negative low-rank graph representation method for clustering (SNLRR) is proposed to solve these problems. To more accurately capture the global representation structure of data, SNLRR uses a logarithmic determinant function that is more consistent with the rank characteristics of the matrix to replace the kernel norm to estimate the rank function smoothly, which can effectively reduce the impact of larger singular values of the matrix on the rank estimation, balance the contribution of all singular values to the rank estimation, enhance the accuracy of the rank estimation, so as to more accurately capture the global representation structure of the data. The distance regularization term is also introduced to adaptively assign the optimal nearest neighbor learning representation matrix for each data point to capture the local representation structure of data. Besides, SNLRR applies rank constraint on the Laplace matrix of representation matrix so that the learned representation graph has the same number of connected components as the real number of clusters, that is, the resulting representation graph has a interconnected structure suitable for clustering. Experimental results on seven datasets with high dimensions and complex distribution, using eight comparison algorithms, show that the clustering performance of SNLRR algorithm is better than that of the eight comparison algorithms, with an average increase of 0.2073 in accuracy and 0.1758 in NMI. Therefore, SNLRR is a graph representation clustering algorithm that can effectively handle data with high dimensions and complex distribution.

Key words: clustering, low-rank representation, rank constraint, logarithmic determinant low rank