Robust Auto-weighted Multi-view Subspace Clustering

doi:10.3778/j.issn.1673-9418.2007003

Abstract

Abstract:

As the ability to collect and store data improving, real data are usually made up of different forms (view). Therefore, multi-view learning plays a more and more important role in the field of machine learning and pattern recognition. In recent years, a variety of multi-view learning methods have been proposed and applied to different practical scenarios. However, since most of the data points in the objective function have square residuals and a few outliers with large errors can easily invalidate the objective function, how to deal with redundant data becomes an important challenge for multi-view learning. For solving the above problems, this paper proposes a model, termed as robust auto-weighted multi-view subspace clustering. The model uses the Frobenius norm to deal with the squared error of data and uses the [?1]-norm to deal with outliers at the same time. Thus the effect of outliers and data points on model performance is effectively balanced. Furthermore, unlike traditional methods which measure the impact of different views by introducing hyper-parameters, the proposed model learns the weight of each view automatically. Since this model is a non-smooth and non-convex problem which is difficult to solve directly, this paper designs an effective algorithm to solve the problem and analyzes the convergence and computational complexity of this algo-rithm. Compared with traditional multi-view subspace clustering algorithms, the experimental results on multi-view datasets present the effectiveness of the proposed algorithm.

Key words: robustness, auto-weighted, multi-view subspace clustering, matrix factorization

摘要：

随着收集和存储数据的能力不断提高，真实数据通常由不同的表现形式（视图）组成。因此多视图学习在机器学习与模式识别领域中扮演着重要的角色。近年来，多种多视图学习方法被提出并应用于不同的实际场景中。然而，在目标函数中大部分数据点存在平方残差，少数误差较大的离群点很容易令目标函数失效，因此如何处理冗余数据是多视图学习面临的重要挑战。为解决上述问题，提出一种鲁棒自加权的多视图子空间聚类模型。该模型利用Frobenius范数来处理数据的平方误差的同时利用[?1]范数来处理数据的离群点，有效地平衡了离群点与普通数据点对性能的影响。此外，与通过引入超参数来衡量不同视图对模型的影响的传统方法不同，该模型自动学习了每个视图的权重。由于该模型是一个非光滑非凸问题，很难直接求解，设计了一个有效的算法并分析了算法的收敛性和计算复杂度。相比于传统的多视图子空间聚类算法，在多个多视图数据集上的实验结果表明了算法的有效性。

关键词: 鲁棒性, 自加权, 多视图子空间聚类, 矩阵分解

FAN Ruidong, HOU Chenping. Robust Auto-weighted Multi-view Subspace Clustering[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1062-1073.

范瑞东, 侯臣平. 鲁棒自加权的多视图子空间聚类[J]. 计算机科学与探索, 2021, 15(6): 1062-1073.

References

[1] LIU R X, GAO Y L, DENG Z H, et al. Multi-view clustering algorithm integrating with sparse hidden view information learning[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(12): 2117-2129.
刘瑞秀, 高艳丽, 邓赵红, 等. 融合稀疏隐视角信息学习的多视角聚类算法[J]. 计算机科学与探索, 2019, 13(12): 2117-2129.
[2] BISSON G, GRIMAL C. Co-clustering of multi-view datasets: a parallelizable approach[C]//Proceedings of the 12th IEEE International Conference on Data Mining, Brussels, Dec 10-13, 2012. Washington: IEEE Computer Society, 2012: 828-833.
[3] LIU J, CAO F Y, GAO X Z, et al. A cluster-weighted kernel K-means method for multi-view clustering[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Con-ference, the 10th AAAI Symposium on Educational Adv-ances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 4860-4867.
[4] ZHANG W, DENG Z H, WANG S T. Kernel-induced incom-plete multi-view clustering[J]. Journal of Frontiers of Com-puter Science and Technology, 2021, 15(2): 284-293.
张炜, 邓赵红, 王士同. 基于核诱导的不完整多视角聚类[J]. 计算机科学与探索, 2021, 15(2): 284-293.
[5] HOU C P, NIE F P, TAO H, et al. Multi-view unsupervised feature selection with adaptive similarity and view weight[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(9): 1998-2011.
[6] ZHUGE W Z, NIE F P, HOU C P, et al. Unsupervised single and multiple views feature extraction with structured graph[J]. IEEE Transactions on Knowledge and Data Engin-eering, 2017, 29(10): 2347-2359.
[7] KANG Z, ZHOU W T, ZHAO Z T, et al. Large-scale multi-view subspace clustering in linear time[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, the 32nd Innovative Applications of Artificial Intelligence Conference, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 4412-4419.
[8] WEN J, ZHANG Z, ZHANG Z, et al. Generalized incom-plete multiview clustering with flexible locality structure diffusion[J]. IEEE Transactions on Cybernetics, 2021, 51(1): 101-114.
[9] ZHANG Y, KONG X W, WANG Z F, et al. Matrix factoriza-tion for multi-view clustering[J]. Acta Automatica Sinica, 2018, 44(12): 2160-2169.
张祎, 孔祥维, 王振帆, 等. 基于多视图矩阵分解的聚类分析[J]. 自动化学报, 2018, 44(12): 2160-2169.
[10] YIN M, HUANG W T, GAO J B. Shared generative latent representation learning for multi-view clustering[C]//Pro-ceedings of the 34th AAAI Conference on Artificial Intellig-ence, the 32nd Innovative Applications of Artificial Intellig-ence Conference, the 10th AAAI Symposium on Educa-tional Advances in Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 6688-6695.
[11] GAO J, HAN J W, LIU J L, et al. Multi-view clustering via joint nonnegative matrix factorization[C]//Proceedings of the 13th SIAM International Conference on Data Mining, Austin, May 2-4, 2013. Philadelphia: SIAM, 2013: 252-260.
[12] WANG Z F, KONG X W, FU H Y, et al. Feature extraction via multi-view non-negative matrix factorization with local graph regularization[C]//Proceedings of the 2015 IEEE Inter-national Conference on Image Processing, Quebec City, Sep 27-30, 2015. Piscataway: IEEE, 2015: 3500-3504.
[13] YANG Z, MICHAILIDIS G. A non-negative matrix factoriza-tion method for detecting modules in heterogeneous omics multi-modal data[J]. Bioinformatics, 2016, 32(1): 1-8.
[14] ZHOU G X, CICHOCKI A, ZHANG Y, et al. Group com-ponent analysis for multiblock data: common and individual feature extraction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(11): 2426-2439.
[15] WANG Z, YUAN W, GIOVANNI M. Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons[J]. Bioinformatics, 2015, 31(19): 3163-3171.
[16] WANG J, TIAN F, YU H C, et al. Diverse non-negative matrix factorization for multiview data representation[J]. IEEE Transactions on Cybernetics, 2018, 48(9): 2620-2632.
[17] ZHANG Z, ZHANG Y, LIU G C, et al. Joint label prediction based semi-supervised adaptive concept factorization for robust data representation[J]. IEEE Transactions on Know-ledge and Data Engineering, 2020, 32(5): 952-970.
[18] JIA Y Q, SALZMANN M, DARRELL T, et al. Factorized latent spaces with structured sparsity[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems, Vancouver, Dec 6-9, 2010. Red Hook: Curran Associates, 2010: 982-990.
[19] HIDRU D, GOLDENBERG A. EquiNMF: graph regularized multiview nonnegative matrix factorization[J]. arXiv:1409. 4018, 2014.
[20] NIE F P, LI J, LI X L, et al. Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, Jul 9-15, 2016. Menlo Park: AAAI, 2016: 1881-1887.
[21] JENATTON R, OBOZINSKI G, BACH F R, et al. Struc-tured sparse principal component analysis[C]//Proceedings of the 13th International Conference on Artificial Intellig-ence and Statistics, Sardinia, May 13-15, 2010: 366-373.
[22] YUAN M, LIN Y. Model selection and estimation in reg-ression with grouped variables[J]. Journal of the Royal Sta-tistical Society. Series B: Statistical Methodology, 2006, 68(1): 49-67.
[23] TIBSHIRANI R. Regression shrinkage and selection via the Lasso[J]. Journal of the Royal Statistical Society. Series B: Methodological, 1996, 58(1): 267-288.
[24] BENGIO S, PEREIRA F C N, SINGER Y, et al. Group sparse coding[C]//Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Vancouver, Dec 7-10, 2009. Red Hook: Curran Associates, 2009: 82-89.
[25] NIE F P, HUANG H, CAI X, et al. Efficient and robust feature selection via joint [?2], 1-norms minimization[C]//Pro-ceedings of the 24th Annual Conference on Neural Infor-mation Processing Systems, Vancouver, Dec 6-9, 2010. Red Hook: Curran Associates, 2010: 1813-1821.
[26] XIA T, TAO D C, MEI T, et al. Multiview spectral embed-ding[J]. IEEE Transactions on Systems, Man, and Cyber-netics, Part B, 2010, 40(6): 1438-1446.