计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (8): 708-716.DOI: 10.3778/j.issn.1673-9418.2012.08.004

• 学术研究 • 上一篇    下一篇

核诱导距离度量的鲁棒典型相关分析

丁  鑫1,陈晓红2,陈松灿1,3+   

  1. 1. 南京航空航天大学 计算机科学与技术学院,南京 210016
    2. 南京航空航天大学 理学院,南京 210016
    3. 南京大学 计算机软件新技术国家重点实验室,南京 210093
  • 出版日期:2012-08-01 发布日期:2012-08-06

Robust Canonical Correlation Analysis Based on Kernel-Induced Measure

DING Xin1, CHEN Xiaohong2, CHEN Songcan1,3+   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    2. College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    3. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
  • Online:2012-08-01 Published:2012-08-06

摘要: 典型相关分析(canonical correlation analysis,CCA)是寻找同一对象两组变量间线性相关性的一种常用的多元统计分析方法,其采用的欧氏距离度量方式导致了算法的非鲁棒性。核诱导的距离度量不仅在理论上被证明是鲁棒的,而且在(聚类)应用上获得了有效验证。将其进一步应用于CCA,发展出了核诱导距离度量的鲁棒CCA(CCA based on kernel-induced measure,KI-CCA)。该算法不仅克服了CCA非鲁棒的不足,而且使现有基于最大相关熵的鲁棒主成分分析(half-quadratic principal component analysis,HQ-PCA)成为特例,且具有非线性相关分析的能力。一方面,核的多样性使得KI-CCA也具有多样性,从而使其成为一般性的分析算法。另一方面,与CCA刻画上的相似性,使其求解可归结为广义特征值问题。在人工数据、多特征手写体数据库(multiple feature database,MFD)和人脸数据集(Yale、AR、ORL)上的实验验证了该算法的有效性。

关键词: 典型相关分析(CCA), 核诱导, 鲁棒性, 广义特征值问题

Abstract: Canonical correlation analysis (CCA) is a commonly used multivariate statistical analysis method which aims at searching for the linear correlation between the two sets of variables of the same object. And the Euclidean distance measure used in CCA results in robustness problem. Kernel-induced measure has been proved to be robust in theory, and has been successfully used in clustering. This paper develops a robust CCA based on kernel-induced measure (KI-CCA). It not only overcomes the shortcomings of CCA and some related algorithms which are not robust, but also makes the robust principal component analysis based on maximum entropy be a special case, and has the ability of nonlinear correlation analysis. Because of the diversity of kernel functions, KI-CCA is a general algorithm. The solution can be obtained by solving a generalized eigenvalue problem as CCA. Experiments on toy problem, multiple feature database (MFD) and face datasets (Yale, AR, ORL) demonstrate the effectiveness of KI-CCA.

Key words: canonical correlation analysis (CCA), kernel-induced, robustness, generalized eigenvalue problem