计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (10): 1570-1578.DOI: 10.3778/j.issn.1673-9418.1608046

• 数据库技术 • 上一篇    下一篇

面向基因数据分类的核主成分分析旋转森林算法

陆慧娟1+,刘亚卿1,孟亚琼1,关  伟2,刘砚秋1   

  1. 1. 中国计量大学 信息工程学院,杭州 310018
    2. 中国计量大学 现代科技学院,杭州 310018
  • 出版日期:2017-10-01 发布日期:2017-10-20

Classifier Algorithm of Genetic Data Based on Kernel Principal Component Analysis and Rotation Forest

LU Huijuan1+, LIU Yaqing1, MENG Yaqiong1, GUAN Wei2, LIU Yanqiu1   

  1. 1. College of Information Engineering, China Jiliang University, Hangzhou 310018, China
    2. College of Modern Science and Technology, China Jiliang University, Hangzhou 310018, China
  • Online:2017-10-01 Published:2017-10-20

摘要: 旋转森林(rotation forest,RoF)是一种运用线性分析理论和决策树的集成分类算法,在分类器个数较少的情况下仍可以取得良好的结果,同时能保证集成分类的准确性。但对于部分基因数据集,存在线性不可分的情况,原始的算法分类效果不佳。提出了一种运用核主成分分析变换的旋转森林算法(rotation forest algorithm based on kernel principal component analysis,KPCA-RoF),选择高斯径向基核函数和主成分分析的方法对基因数据集进行非线性映射和差异性变化,着重于参数的选择问题,再利用决策树算法进行集成学习。实验证明,改进后的算法能很好地解决数据线性不可分的情形,同时也提高了基因数据集上的分类精度。

关键词: 核函数, 主成分分析, 决策树, 旋转森林, 基因数据分类

Abstract: Rotation forest (RoF) algorithm is an ensemble classification algorithm using linear analysis theory and decision trees. The rotation forest achieves higher classification accuracy and superior performance with less number of classifiers. However, the classification accuracy decreases for gene expression data with linearly inseparable cases. To address this issue, this paper proposes a rotation forest algorithm based on kernel principal component analysis (KPCA-RoF), chooses the Gaussian kernel function and principal component analysis to implement the nonlinear mapping and deal with differences in gene data. The proposed algorithm focuses on the optimization of parameters, and uses decision tree algorithm for ensemble learning. Experiments show that the new algorithm well addresses the linearly inseparabal issue and improves the classification accuracy.

Key words: kernel function, principal component analysis, decision tree, rotation forest, gene data classification