计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (10): 948-953.DOI: 10.3778/j.issn.1673-9418.2012.10.010

• 学术研究 • 上一篇    下一篇

高维数据的1-范数支持向量机集成特征选择

鲍  捷+,杨  明,刘会东   

  1. 南京师范大学 计算机科学与技术学院,南京 210046
  • 出版日期:2012-10-01 发布日期:2012-09-28

Ensemble Feature Selection Based on 1-Norm Support Vector Machine for High-
Dimensional Data

BAO Jie+, YANG Ming, LIU Huidong   

  1. School of Computer Science and Technology, Nanjing Normal University, Nanjing 210046, China
  • Online:2012-10-01 Published:2012-09-28

摘要: 特征选择是机器学习和模式识别领域的关键问题之一。随着模式识别与数据挖掘的深入,研究对象越来越复杂,对象的特征维数也越来越高,此时特征选择的稳定性也显得尤为重要。分析了1-范数支持向量机,用该方法对高维数据进行特征选择,并对特征选择的结果进行集成;提出了一种针对高维数据的稳定性度量方法;在基因表达数据上的实验结果表明,集成特征选择可以有效提高算法的稳定性。

关键词: 特征选择, 高维数据, 稳定性, 1-范数支持向量机, 集成

Abstract: Feature selection is one of the key issues in the field of machine learning and pattern recognition. With pattern recognition and data mining becoming increasingly deeper, the target of research becoming more and more complex and the dimension of feature becoming higher and higher, the stability of feature selection is particularly important. Based on the sparse SVM (support vector machine) model, this paper analyzes L1SVM (1-norm support vector machine), applies this method to feature selection on high-dimensional data and integrates the results of feature selection according to ensemble learning principle of feature selection. Moreover, the paper designs a new stability measure for high-dimensional data. The experimental results on the gene expression data demonstrate that ensemble feature selection is able to effectively improve the stability of feature selection.

Key words: feature selection, high-dimensional data, stability, 1-norm support vector machine (L1SVM), ensemble