计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (1): 108-116.DOI: 10.3778/j.issn.1673-9418.1903054

• 人工智能 • 上一篇    下一篇

一种模型决策森林算法

尹儒,门昌骞,王文剑   

  1. 1.山西大学 计算机与信息技术学院,太原 030006
    2.山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006
  • 出版日期:2020-01-01 发布日期:2020-01-09

Model Decision Forest Algorithm

YIN Ru, MEN Changqian, WANG Wenjian   

  1. 1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Online:2020-01-01 Published:2020-01-09

摘要: 随机森林(RF)具有抗噪能力强,预测准确率高,能够处理高维数据等优点,因此在机器学习领域得到了广泛的应用。模型决策树(MDT)是一种加速的决策树算法,虽然能够提高决策树算法的训练效率,但是随着非纯伪叶结点规模的增大,模型决策树的精度也在下降。针对上述问题,提出了一种模型决策森林算法(MDF)以提高模型决策树的分类精度。MDF算法将MDT作为基分类器,利用随机森林的思想,生成多棵模型决策树。算法首先通过旋转矩阵得到不同的样本子集,然后在这些样本子集上训练出多棵不同的模型决策树,再将这些树通过投票的方式进行集成,最后根据得到的模型决策森林给出分类结果。在标准数据集上的实验结果表明,提出的模型决策森林在分类精度上明显优于模型决策树算法,并且MDF在树的数量较少时也能取到不错的精度,避免了因树的数量增加时间复杂度增高的问题。

关键词: 基尼指数, 模型决策森林(MDF), 模型决策树(MDT), 随机森林(RF)

Abstract: Random forest (RF) has been widely used in machine learning because of its strong anti-noise ability, high prediction accuracy, and applicability for high-dimensional data. Model decision tree (MDT) is an accelerated decision tree algorithm. Although it can improve the training efficiency of the algorithm, the accuracy of MDT decreases with the increase of impure pseudo leaf nodes size. To solve this problem, model decision forest (MDF) algorithm is proposed to improve the classification accuracy of the MDT. The MDF algorithm takes the MDT as the base classifier and uses the idea of random forest to generate multiple model decision trees. Firstly, the algorithm obtains different sample subsets via rotation matrix. Secondly, multiple different model decision trees are trained on these sample subsets, and integrated through voting. Finally, the classification results will be achieved by the obtained model decision forest. Experimental results on benchmark datasets show that the proposed MDF algorithm is superior to the MDT algorithm in terms of accuracy. Moreover, MDF can obtain high accuracy when the number of trees is small, avoiding the problem of increasing time complexity due to the increment of trees.

Key words: Gini index, model decision forest (MDF), model decision tree (MDT), random forest (RF)