计算机科学与探索

• 学术研究 •    

改进的Ramp孪生支持向量机聚类

陈素根,刘玉菲   

  1. 1.安庆师范大学 数理学院,安徽 安庆 246133
    2.安徽省大别山区域复杂生态系统建模、仿真与控制重点实验室,安徽 安庆 246133
    3.安徽省皖江流域种群生态模拟与控制国际联合研究中心,安徽 安庆 246133

Improved Ramp-based twin support vector clustering

CHEN Sugen, LIU Yufei   

  1. 1.School of Mathematics and Physics, Anqing Normal University, Anqing, Anhui 246133, China
    2.Key Laboratory of Modeling, Simulation and Control of Complex Ecosystem in Dabie Mountains of Anhui Higher Education Institutes, Anqing, Anhui 246133, China
    3.International Joint Research Center of Simulation and Control for Population Ecology of Yangtze River in Anhui Province, Anqing, Anhui 246133, China

摘要: 基于Hinge损失的孪生支持向量机聚类和基于Ramp损失的孪生支持向量机聚类是两种平面聚类的新算法,为解决聚类问题提供了新的研究思路,逐渐成为模式识别等领域的研究热点。然而,它们在处理带有噪声数据的聚类问题时,往往性能表现不佳。为了解决这个问题,本文构造了非对称的Ramp损失函数,并在此基础上提出了一种改进的Ramp孪生支持向量机聚类算法。非对称Ramp损失函数不仅继承了Ramp损失函数的优点,用非对称的有界函数度量类内散度和类间散度,使得该算法对离聚类中心平面较远的数据点更加鲁棒,而且参数t的引入使得非对称Ramp损失函数更加灵活。特别地,当参数t等于1时,非对称Ramp损失函数退化为Ramp损失函数,使得基于Ramp损失函数的孪生支持向量机聚类算法成为本文所提算法的特例。同时,基于核技巧推广到了非线性情形,线性和非线性模型均为非凸优化问题,通过交替迭代算法有效求解。分别在多个UCI数据集和人工数据集上进行实验,实验结果验证了所提算法的有效性。

关键词: 聚类, 孪生支持向量机聚类, 损失函数

Abstract: Twin support vector clustering based on Hinge loss and twin support vector clustering based on Ramp loss are two new twin support vector clustering algorithms, which provide a new research idea for solving the clustering problem, and gradually become a research hotspot in pattern recognition and other fields. However, they often have poor performance when dealing with the clustering problem with noisy data. To solve this problem, in this paper, an asymmetric Ramp loss function is constructed and then an improved Ramp-based twin support vector clustering algorithm is also proposed. The asymmetric Ramp loss function not only inherits the advantages of the Ramp loss function, but also uses asymmetric bounded functions to measure the within-cluster and between-cluster scatters, which makes the algorithm more robust to data points far from the clustering center plane. The introduction of parameter t makes the asymmetric Ramp loss function more flexible. In particular, when t is equal to 1, the asymmetric Ramp loss function degenerates into Ramp loss function, such that the Ramp-based twin support vector clustering becomes a special case of our proposed algorithm. In addition, its nonlinear clustering formation is also proposed via kernel trick. The non-convex optimization problems in linear and nonlinear models are solved effectively through the alternating iterative algorithm. Experiments are carried out on several benchmark UCI datasets and artificial datasets, and the experimental results verify the effectiveness of the proposed algorithm.

Key words: Clustering, Twin support vector clustering, Loss function