平滑非负低秩图表示聚类算法

doi:10.3778/j.issn.1673-9418.2212041

摘要/Abstract

摘要： 针对现有低秩图表示算法在构建表示图时未能精确捕获数据的全局表示结构、未能充分利用数据有效信息指导表示图的构建以及构建的表示图不具有适于聚类的连通结构等问题，提出了平滑非负低秩图表示聚类算法（SNLRR）。SNLRR采用一种更符合矩阵秩特性的对数行列式函数代替核范数平滑地估计秩，有效降低矩阵较大奇异值对秩估计的影响，平衡了所有奇异值对秩估计的贡献比重，增强秩估计的准确性，从而更精准地捕获数据的全局表示结构。为了更加准确地捕获数据局部表示结构，SNLRR引入距离正则项为每个数据点自适应地分配最优近邻学习表示矩阵。此外，SNLRR对表示矩阵的拉普拉斯矩阵施加秩约束，使最终学习到的表示图具有与簇个数相同数量的连通分量，即表示图具有适于聚类的连通结构。与八个对比算法在七个高维且分布复杂的数据集上的实验结果显示，SNLRR算法的聚类性能均优于八种对比算法，Accuracy平均提高了0.207 3，NMI平均提高了0.175 8。因此，SNLRR是一个能够有效处理维度高且分布复杂数据的图表示聚类算法。

关键词: 聚类, 低秩表示, 秩约束, 对数行列式低秩

Abstract: The existing low-rank graph representation algorithms fail to capture the global representation structure of data accurately, and cannot make full use of the valid information of data to guide the construction of the representation graph, then the constructed representation graph does not have a connected structure suitable for clustering. A smooth non-negative low-rank graph representation method for clustering (SNLRR) is proposed to solve these problems. To more accurately capture the global representation structure of data, SNLRR uses a logarithmic determinant function that is more consistent with the rank characteristics of the matrix to replace the kernel norm to estimate the rank function smoothly, which can effectively reduce the impact of larger singular values of the matrix on the rank estimation, balance the contribution of all singular values to the rank estimation, enhance the accuracy of the rank estimation, so as to more accurately capture the global representation structure of the data. The distance regularization term is also introduced to adaptively assign the optimal nearest neighbor learning representation matrix for each data point to capture the local representation structure of data. Besides, SNLRR applies rank constraint on the Laplace matrix of representation matrix so that the learned representation graph has the same number of connected components as the real number of clusters, that is, the resulting representation graph has a interconnected structure suitable for clustering. Experimental results on seven datasets with high dimensions and complex distribution, using eight comparison algorithms, show that the clustering performance of SNLRR algorithm is better than that of the eight comparison algorithms, with an average increase of 0.2073 in accuracy and 0.1758 in NMI. Therefore, SNLRR is a graph representation clustering algorithm that can effectively handle data with high dimensions and complex distribution.

Key words: clustering, low-rank representation, rank constraint, logarithmic determinant low rank

钱罗雄, 陈梅, 张弛, 张锦宏, 马学艳. 平滑非负低秩图表示聚类算法[J]. 计算机科学与探索, 2024, 18(3): 659-673.

QIAN Luoxiong, CHEN Mei, ZHANG Chi, ZHANG Jinhong, MA Xueyan. Smooth Non-negative Low-Rank Graph Representation for Clustering[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 659-673.

参考文献

[1] CHEN M, CHEN Y, ZHU H, et al. Analysis of pollutants transport in heavy air pollution processes using a new complex-network-based model[J]. Atmospheric Environment, 2023, 292: 119395.
[2] 陈梅. 面向复杂数据的聚类算法研究[D]. 兰州：兰州大学，2016.
CHEN M. Research on clustering algorithm for complex data[D]. Lanzhou: Lanzhou University, 2016.
[3] 李珺，刘鹤，朱良宽. 基于改进的K-means算法的关联规则数据挖掘研究[J]. 小型微型计算机系统，2021, 42(1): 15-19.
LI J, LIU H, ZHU L K. Research on association rule-data mining based on improved K-means algorithm[J]. Journal of Chinese Computer Systems, 2021, 42(1): 15-19.
[4] KARNA A, GIBERT K. Automatic identification of the number of clusters in hierarchical clustering[J]. Neural Computing and Applications, 2021, 34: 119-134.
[5] 张锦宏, 陈梅, 张弛. 自适应阈值约束的密度簇主干聚类算法[J]. 计算机科学与探索, 2023, 17(12): 2880-2895.
ZHANG J H, CHEN M, ZHANG C. Density backbone clustering algorithm based on adaptive threshold[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(12): 2880-2895.
[6] CHENG M, MA T, MA L, et al. Adaptive grid-based forest-like clustering algorithm[J]. Neurocomputing, 2022, 481: 168-181.
[7] 白璐，赵鑫，孔钰婷，等. 谱聚类算法研究综述[J]. 计算机工程与应用, 2021, 57(14): 15-26.
BAI L, ZHAO X, KONG Y T, et al. Survey of spectral clustering algorithms[J]. Computer Engineering and Applications, 2021, 57(14): 15-26.
[8] ZHU Q, ZHANG R, HUANG S J, et al. LGSLRR: towards fusing discriminative ordinal local and global structured low-rank representation for image recognition[J]. Information Sciences, 2020, 539: 522-535.
[9] ELHAMIFAR E, VIDAL R. Sparse subspace clustering: algorithm, theory, and applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2765-2781.
[10] LIU G C, LIN Z C, YAN S C, et al. Robust recovery of subspace structures by low-rank representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 171-184.
[11] LIU G C, YAN S C. Latent low-rank representation for subspace segmentation and feature extraction[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Nov 6-13, 2011. Washington: IEEE Computer Society, 2011: 1615-1622.
[12] CHEN J, MAO H, SANG Y S, et al. Subspace clustering using a symmetric low-rank representation[J]. Knowledge-Based Systems, 2017, 127: 46-57.
[13] LI X L, CUI G S, DONG Y S. Graph regularized non-negative low-rank matrix factorization for image clustering[J]. IEEE Transactions on Cybernetics, 2016, 47(11): 3840-3853.
[14] ZHUANG L S, GAO H Y, LIN Z C, et al. Non-negative low rank and sparse graph for semisupervised learning[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Jun 16-21, 2012. Washington: IEEE Computer Society, 2012: 2328-2335.
[15] FU Z Q, ZHAO Y, CHANG D X, et al. A hierarchical weighted low-rank representation for image clustering and classification[J]. Pattern Recognition, 2021, 112: 107736.
[16] WEN J, FANG X Z, XU Y, et al. Low-rank representation with adaptive graph regularization[J]. Neural Networks, 2018, 108: 83-96.
[17] 杨永鹏，杨真真，李建林，等. 改进的截断核范数及在视频前背景分离中的应用[J]. 工程科学与技术，2021, 53(5): 219-226.
YANG Y P, YANG Z Z, LI J L, et al. Improved truncated nuclear norm and its application in video foreground-background separation[J]. Advanced Engineering Sciences, 2021, 53(5): 219-226.
[18] DUAN Y H, WEN R B, XIAO Y J. A singular value thres-holding with diagonal-update algorithm for low-rank matrix completion[J]. Mathematical Problems in Engineering, 2020. DOI: 10.1155/2020/8812701.
[19] CAI D, HE X F, HAN J W, et al. Graph regularized nonnegative matrix factorization for data representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(8): 1548-1560.
[20] 肖成龙，张重鹏，王珊珊，等. 基于流形正则化与成对约束的深度半监督谱聚类算法[J]. 系统科学与数学，2020, 40(8): 1325-1341.
XIAO C L, ZHANG C P, WANG S S, et al. Deep semi-supervised spectral clustering algorithm based on regularization of manifold and pairwise constraints[J]. Journal of Systems Science and Mathematical Sciences, 2020, 40(8): 1325-1341.
[21] 郑建炜，朱文博，王万良，等. 块对角拉普拉斯约束的平滑聚类算法[J]. 计算机辅助设计与图形学学报，2018, 30(1): 116-123.
ZHENG J W, ZHU W B, WANG W L, et al. Smooth clustering with block-diagonal constrained Laplacian regularizer[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(1): 116-123.
[22] 任永功，刘洋，赵月. 基于秩约束密度敏感距离的自适应聚类算法[J]. 计算机科学，2017, 44(5): 276-279.
REN Y G, LIU Y, ZHAO Y. Adaptive clustering algorithm based on rank constraint density Sensitive distance[J]. Computer Science, 2017, 44(5): 276-279.
[23] NIE F P, WANG X Q, HUANG P. Clustering and projected clustering with adaptive neighbors[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 977-986.
[24] NIE F P, WANG X Q, JORDAN M, et al. The constrained Laplacian rank algorithm for graph-based clustering[C]//Proceedings of the 2016 AAAI Conference on Artificial Intelligence, Phoenix, Feb 12-17, 2016. Menlo Park: AAAI, 2016: 1969-1976.
[25] KANG Z, PENG C, CHENG J, et al. Logdet rank minimization with application to subspace clustering[J]. Computational Intelligence and Neuroscience, 2015: 824289.
[26] KANG Z, PENG C, CHENG J. Robust subspace clustering via smoothed rank approximation[J]. IEEE Signal Processing Letters, 2015, 22(11): 2088-2092.
[27] LIN Z C, CHEN M M, MA Y. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices[J]. arXiv:1009.5055, 2010.
[28] EL MOUDEN Z A, JAKIMI A. k-eNSC: k-estimation for normalized spectral clustering[C]//Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision, Fez, Jun 9-11, 2020. Piscataway: IEEE, 2020: 1-5.
[29] YIN M, GAO J B, LIN Z C. Laplacian regularized low-rank representation and its applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(3): 504-517.
[30] MA Z R, KANG Z, LUO G C, et al. Towards clustering-friendly representations: subspace clustering via graph filtering[C]//Proceedings of the 28th ACM International Conference on Multimedia, Washington, Oct 12-16, 2020. New York: ACM, 2020: 3081-3089.