改进的Ramp孪生支持向量机聚类

doi:10.3778/j.issn.1673-9418.2206039

摘要/Abstract

摘要： 基于Hinge损失的孪生支持向量机聚类和基于Ramp损失的孪生支持向量机聚类是两种平面聚类的新算法，为解决聚类问题提供了新的研究思路，逐渐成为模式识别等领域的研究热点。然而，它们在处理带有噪声数据的聚类问题时，往往性能表现不佳。为了解决这个问题，构造了非对称的Ramp损失函数，并在此基础上提出了一种改进的Ramp孪生支持向量机聚类算法。非对称Ramp损失函数不仅继承了Ramp损失函数的优点，用非对称的有界函数度量类内散度和类间散度，使得该算法对离聚类中心平面较远的数据点更加鲁棒，而且参数[t]的引入使得非对称Ramp损失函数更加灵活。特别地，当参数[t]等于1时，非对称Ramp损失函数退化为Ramp损失函数，使得基于Ramp损失函数的孪生支持向量机聚类算法成为所提算法的特例。同时，基于核技巧推广到了非线性情形，线性和非线性模型均为非凸优化问题，通过交替迭代算法有效求解。分别在多个UCI数据集和人工数据集上进行实验，实验结果验证了所提算法的有效性。

关键词: 聚类, 孪生支持向量机聚类, 损失函数

Abstract: Twin support vector clustering based on Hinge loss and twin support vector clustering based on Ramp loss are two new twin support vector clustering algorithms, which provide a new research idea for solving the clustering problem, and gradually become a research hotspot in pattern recognition and other fields. However, they often have poor performance when dealing with the clustering problem with noisy data. To solve this problem, in this paper, an asymmetric Ramp loss function is constructed and then an improved Ramp-based twin support vector clustering algorithm is also proposed. The asymmetric Ramp loss function not only inherits the advantages of the Ramp loss function, but also uses asymmetric bounded functions to measure the within-cluster and between-cluster scatters, which makes the algorithm more robust to data points far from the clustering center plane. The introduction of parameter t makes the asymmetric Ramp loss function more flexible. In particular, when t is equal to 1, the asymmetric Ramp loss function degenerates into Ramp loss function, such that the Ramp-based twin support vector clustering becomes a special case of proposed algorithm. In addition, its nonlinear clustering formation is also proposed via kernel trick. The non-convex optimization problems in linear and nonlinear models are solved effectively through the alternating iterative algorithm. Experiments are carried out on several benchmark UCI datasets and artificial datasets, and the experimental results verify the effectiveness of the proposed algorithm.

Key words: clustering, twin support vector clustering, loss function

陈素根, 刘玉菲. 改进的Ramp孪生支持向量机聚类[J]. 计算机科学与探索, 2023, 17(11): 2767-2776.

CHEN Sugen, LIU Yufei. Improved Ramp-Based Twin Support Vector Clustering[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(11): 2767-2776.

参考文献

[1] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 121-144.
ZHOU Z H. Machine learning[M]. Beijing: Tsinghua University Press, 2016: 121-144.
[2] 邵俊健, 王士同. 具有抗噪性能适用高维数据的增量式聚类算法[J]. 计算机科学与探索, 2019, 13(9): 1553-1566.
SHAO?J J,?WANG?S T. Incremental clustering algorithm with anti-noise performance and suitable for high dimensional data[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(9): 1553-1566.
[3] ZHANG C F, FANG Z Y. An improved k-means clustering algorithm[J]. Journal of Information and Computational Science, 2013, 10(1): 193-199.
[4] BRADLEY P S, MANGASARIAN O L. K-plane clustering [J]. Journal of Global Optimization, 2000, 16(1): 23-32.
[5] LIU L M, GUO Y R, WANG Z, et al. K-proximal plane clustering[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(5): 1537-1554.
[6] WANG Z, SHAO Y H, BAI L, et al. Twin support vector machine for clustering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2583-2588.
[7] WANG Z, CHEN X, SHAO Y H, et al. Ramp-based twin support vector clustering[J]. Neural Computing and Applications, 2019, 32: 1-12.
[8] WANG Z, SHAO Y H, BAI L, et al. General plane-based clustering with distribution loss[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(9): 3880-3893.
[9] TANVEER M, SHARMA A, SUGANTHAN P N. General twin support vector machine with pinball loss function[J]. Information Sciences, 2019, 494: 311-327.
[10] BALASUNDARAM S, PRASAD S C. Robust twin support vector regression based on huber loss function[J]. Neural Computing and Applications, 2020, 32: 11285-11309.
[11] 王华军, 修乃华. 支持向量机损失函数分析[J]. 数学进展, 2021, 50(6): 1-28.
WANG H J, XIU N H. Support vector machine based on loss function analysis[J]. Mathematical Progress, 2021, 50(6): 1-28.
[12] TANVEER M, GUPTA T, SHAH M. Pinball loss twin support vector clustering[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17: 1-23.
[13] TANVEER M, GUPTA T, SHAH M. Spare twin support vector clustering using pinball loss[J]. IEEE Journal of Biomedical and Health Informatics, 2021, 25(10): 3776-3783.
[14] LAROSE D T. K-nearest neighbor algorithm[M]//Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken: John Willey & Sons, Inc., 2005: 90-106.
[15] DEM?AR J. Statistical comparisons of classifiers over multiple data sets[J]. The Journal of Machine Learning Research, 2006, 7: 1-30.