计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (2): 496-505.DOI: 10.3778/j.issn.1673-9418.2211073

• 人工智能·模式识别 • 上一篇    下一篇

采用快速迁移模型的集成特征选择方法

宁保斌,王士同   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 出版日期:2024-02-01 发布日期:2024-02-01

Ensemble Feature Selection Method with Fast Transfer Model

NING Baobin, WANG Shitong   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2024-02-01 Published:2024-02-01

摘要: 相较于传统集成特征选择方法,目前的基于块正则化[m×2]交叉验证的集成特征选择方法(EFSBCV)不仅具有估计量的方差小于随机[m×2]交叉验证的方差之特点,而且提高了重要特征的入选概率,降低了噪声特征的入选概率。但EFSBCV所采用的线性回归模型因只有误差项而不包含偏置项,故拟合出来的超平面总是过原点的,因而很容易导致欠拟合,而且EFSBCV没有考虑每个特征子集的重要程度。针对EFSBCV方法存在的这两点问题,提出了基于快速迁移模型的集成特征选择方法(EFSFT)。基本思想是EFSBCV中的基特征选择器采用提出的快速迁移模型,从而引入了偏置项,EFSFT将[2m]个特征子集作为源知识进行迁移,然后重新量化每个特征子集的权重,加入偏置项的线性模型拟合能力更好。真实数据实验表明,EFSFT相对于EFSBCV,FP平均值降低了58%,证明EFSFT在去除噪声特征方面更具优势。EFSFT相对于最小二乘支持向量机(LSSVM),TP平均值提高了5%,证明EFSFT在筛选重要特征方面更具优势。

关键词: 集成特征选择, 交叉验证, 迁移学习, 回归

Abstract: Compared with the traditional ensemble feature selection methods, the recently-developed ensemble feature selection with block-regularized [m×2] cross-validation (EFSBCV) not only has a variance of the estimator smaller than that of random [m×2] cross-validation, but also enhances the selection probability of important features and reduces the selection probability of noise features. However, the adopted linear regression model without the use of the bias term in EFSBCV may easily lead to underfitting. Moreover, EFSBCV does not consider the importance of each feature subset. Aiming at these two problems, an ensemble feature selection method called EFSFT (ensemble feature selection method using fast transfer model) is proposed in this paper. The basic idea is that the base feature selector in EFSBCV adopts the fast transfer model in this paper, so as to introduce the bias term. EFSFT transfers 2m subsets of features as the source knowledge, and then recalculates the weight of each feature subset, and the linear model fitting ability with the addition of bias terms is better. The results on real datasets show that compared with EFSBCV, the average FP value by EFSFT reduces up to 58%, proving that EFSFT has more advantages in removing noise features. In contrast to least-squares support vector machine (LSSVM), the average TP value by EFSFT increases up to 5%, which clearly indicates the superiority of EFSFT over LSSVM in choosing important features.

Key words: ensemble feature selection, cross-validation, transfer learning, regression