计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (11): 3027-3040.DOI: 10.3778/j.issn.1673-9418.2309037

• 人工智能·模式识别 • 上一篇    下一篇

稀疏矩阵和改进归一化切割的快速多视图聚类

杨明瑞,周世兵,王茜,宋威   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 出版日期:2024-11-01 发布日期:2024-10-31

Fast Multi-view Clustering with Sparse Matrix and Improved Normalized Cut

YANG Mingrui, ZHOU Shibing, WANG Xi, SONG Wei   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2024-11-01 Published:2024-10-31

摘要: 多视图聚类是一种新颖的聚类算法,它可以有效地探索出数据之间的内在聚类结构。大多数多视图聚类算法在构造相似图时容易受到噪声的影响,而且在聚类过程中还会面临信息损失问题,从而降低聚类结果的准确性。此外,现有多视图聚类算法通常使用交替迭代优化方法获得最优解,多次迭代会导致内存溢出或耗时过长。为了解决上述问题,提出了一种基于稀疏矩阵和改进归一化切割的快速多视图聚类算法(SINFMC)。该算法根据原始数据构造每个视图的相似图,并对相似图进行融合得到共识图矩阵。对共识图矩阵进行[l1]范数约束获得稀疏矩阵,实现数据降噪和加速计算。使用改进的归一化谱聚类算法对稀疏的共识图进行聚类得到聚类指标矩阵,这样不仅能够直接获得聚类结果,而且消除了聚类过程中的信息损失和偏差。该聚类算法无需交替迭代优化且通过稀疏矩阵表示精简计算过程,大幅降低了算法的时间和空间复杂度。人工和真实数据集上的比较实验结果表明该算法在质量和效率方面优于对比算法。

关键词: 多视图聚类, 稀疏矩阵, 归一化切割, 软阈值, 图融合

Abstract: The multi-view clustering algorithm is a novel approach to explore the inherent clustering structure among data. However, most existing methods suffer from noise issues when constructing similarity graphs and may lose important information during the clustering, leading to lower accuracy. Moreover, iterative optimization approaches often used by these algorithms can be memory-overflowing and time-consuming. To address these limitations, a fast multi-view clustering algorithm with sparse matrix and improved normalized cut (SINFMC) is proposed. It first constructs similarity graphs for all views and integrates them to form a consensus graph matrix. Then, the [l1]-norm constraint is applied to the consensus graph matrix to obtain a sparse matrix, which helps to denoise the data and speed up computations. Finally, an improved normalized spectral clustering algorithm is used to cluster the sparse consensus graph and obtain a cluster indicator matrix. This matrix provides clustering results directly and avoids information loss and bias. Unlike other methods, the proposed algorithm does not require iterative optimization and simplifies the computation process through sparse matrix representation, reducing time and space complexity. Experimental results on both artificial and real-world datasets demonstrate that the proposed algorithm outperforms the compared algorithms in terms of quality and efficiency.

Key words: multi-view clustering, sparse matrix, normalized cuts, soft threshold, graph fusion