计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (1): 112-121.DOI: 10.3778/j.issn.1673-9418.1505052

• 人工智能与模式识别 • 上一篇    下一篇

不定核大间隔聚类算法

李  森1,2,薛  晖1,2+   

  1. 1. 东南大学 计算机科学与工程学院,南京 211189
    2. 东南大学 计算机网络和信息集成教育部重点实验室,南京 211189
  • 出版日期:2016-01-01 发布日期:2016-01-07

Indefinite Kernel Maximum Margin Clustering Algorithm

LI Sen1,2, XUE Hui1,2+   

  1. 1. School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
    2. Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 211189, China
  • Online:2016-01-01 Published:2016-01-07

摘要: 受限于传统统计学习理论,大多数核方法都要求核矩阵半正定,但是在很多实际问题中这样的要求常常很难满足,由此产生了不定核。近年来,研究者们提出了一系列基于不定核的分类方法,取得了很好的性能,但是关于不定核聚类方法的研究相对较少,而且现有的核聚类算法基本上都是基于正定核而设计的,无法或者很难处理核矩阵不定的情况。针对此问题,以大间隔聚类(maximum margin clustering,MMC)模型为基础,提出了一种新的不定核大间隔聚类(indefinite kernel maximum margin clustering,IKMMC)算法。IKMMC算法旨在寻求一个正定核以逼近不定核,并将度量两者差异性的F-范数作为一个正则化项嵌入到MMC框架中。首先给定样本初始标记,然后迭代优化目标函数,并将每步迭代得到的样本预测错误率作为迭代终止条件。在每步迭代时,IKMMC算法进一步将目标函数转化为半无限规划(semi-infinite program,SIP)形式,并动态调整约束集进行交替优化。实验验证了IKMMC算法的有效性。

关键词: 大间隔聚类, 不定核, 半无限规划

Abstract: Limited to traditional statistical learning theory, most kernel methods require the kernel matrices to be positive semi-definite. But in many practical problems, such requirement is difficult to be satisfied and thus indefinite kernels appear. Recently, researchers have proposed many methods to solve indefinite kernel classification problems and achieved much better performance. However, the research about indefinite kernel clustering is relatively scarce. Furthermore, existing clustering methods are mainly designed based on positive definite kernels which are incapable in indefinite kernel scenarios. This paper proposes a novel method termed as indefinite kernel maximum margin clustering (IKMMC) based on the state-of-the-art maximum margin clustering (MMC) model. IKMMC aims to find a proxy positive definite kernel to approximate the original indefinite one and thus embeds a new F-norm regularizer measuring the diversity of the two kernels into the MMC model. Given a set of initial labels, IKMMC uses an iterative approach to optimize the objective function and takes the error rate of prediction as the loop-termination criteria. At each iteration, IKMMC transfers the objective function into semi-infinite program (SIP) form, and dynamically adjusts constraint set for optimizing alternately. The experimental results show the superiority of IKMMC algorithm.

Key words: maximum margin clustering, indefinite kernel, semi-infinite program