Journal of Frontiers of Computer Science and Technology ›› 2018, Vol. 12 ›› Issue (9): 1434-1443.DOI: 10.3778/j.issn.1673-9418.1705041

Previous Articles     Next Articles

Diverse Random Subspace Ensemble

DING Yi, WANG Mingliang, ZHANG Daoqiang+   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China
  • Online:2018-09-01 Published:2018-09-10

差异性随机子空间集成

丁    毅,王明亮张道强+   

  1. 南京航空航天大学 计算机科学与技术学院,南京 211100

Abstract: Random subspace ensemble method is an essential part of ensemble learning research. It constructs several base learners on randomly selected feature subspaces, and finds a suitable way to combine the results of these base learners to give a final result. Random subspace ensemble method is much suitable for datasets with much higher feature dimensions than samples. However, because of the high feature dimensions, the ensemble model should sample numerous subspaces but cannot keep enough diversity between these subspaces, which causes low efficiency and bad performance. This paper proposes a diverse random subspace ensemble method without supervision and training. This method uses the multi-kernel MMD (maximum mean discrepancy) as similarity measure of subspace, and uses the spectral clustering algorithm on high similarity subspaces to select a representative subspace among lots of random subspaces with similar distribution structure. The experimental results demonstrate the effectiveness and efficiency of the proposed method when using less base learners, especially on datasets with high feature-sample ratio.

Key words: random subspace ensemble, diversity, ensemble learning, machine learning

摘要: 随机子空间集成方法是集成学习中的一个重要部分,它通过随机选取原特征空间中的数个子空间构建基分类器并集成基学习器得到最终的结果。随机子空间集成方法尤其适用于特征维度高于样本数量的情况,而传统的随机子空间集成对高维数据采集大量的子空间且子空间之间存在很高的冗余度,从而导致模型获得较差的性能。因此,提出了一种无监督和不需要训练的差异性随机子空间集成算法。该算法利用多核最大均值差异(maximum mean discrepancy,MMD)作为子空间的相似性度量,并利用谱聚类算法将高相似性子空间聚类,从中选择一个代表性子空间,从而得到差异性子空间集合。实验表明,基于差异性随机子空间集成的模型在使用较少的基学习器时依然能获得较好的性能,尤其在具有很高的特征-样本比的数据集上。

关键词: 随机子空间集成, 差异性度量, 集成学习, 机器学习