计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (3): 457-467.DOI: 10.3778/j.issn.1673-9418.1809047

• 人工智能与模式识别 • 上一篇    下一篇

方差正则化的分类模型选择准则

房立超1,王  钰2,3+,杨杏丽1,李济洪3   

  1. 1. 山西大学 数学科学学院,太原 030006
    2. 山西大学 现代教育技术学院,太原 030006
    3. 山西大学 软件学院,太原 030006
  • 出版日期:2019-03-01 发布日期:2019-03-11

Variance-Regularized Classification Model Selection Criterion

FANG Lichao1, WANG Yu2,3+, YANG Xingli1, LI Jihong3   

  1. 1. School of Mathematical Sciences, Shanxi University, Taiyuan 030006, China
    2. School of Modern Educational Technology, Shanxi University, Taiyuan 030006, China
    3. School of Software, Shanxi University, Taiyuan 030006, China
  • Online:2019-03-01 Published:2019-03-11

摘要: 在传统的机器学习中,模型选择常常是直接基于某个性能度量指标的估计本身进行,没有考虑估计的方差,但是这样的忽略极有可能导致错误模型的选择。于是考虑在分类模型选择研究中添加方差的信息的方法,以提高所选模型的泛化能力,即将泛化误差性能度量指标的组块3×2交叉验证估计的方差估计作为正则化项添加到传统模型选择准则中,提出了一种新的方差正则化的分类模型选择准则。模拟和真实数据实验验证了在分类模型选择问题中,提出的模型选择准则相比传统方法选到正确分类模型的概率更大,验证了方差在模型选择中的重要性以及提出的模型选择准则的有效性。进一步,理论上证明了在二分类问题的模型选择中,该模型选择准则具有选择的一致性。

关键词: 模型选择, 泛化误差, 组块3×, 2交叉验证, 方差正则化

Abstract: In traditional machine learning, model selection is always directly performed based on the estimation of one performance measure index, without considering the variance of the estimation. However, this neglection may probably lead to the selection of a wrong model. Therefore, a method of adding the information of variance into the study of classification model selection is considered in order to improve the generalization ability of the selected model, that is, the variance estimation of the block 3×2 cross-validation estimation of the generalization error is added as a regularization term into the traditional model selection criterion, and a new variance-regularized classification model selection criterion is proposed. The simulated and real data experiments show that the proposed model selection criterion has a higher probability to select the correct classification model in the classification model selection problem compared to the traditional methods. The importance of variance in model selection and the effectiveness of the   proposed model selection criteria are also validated. Furthermore, the consistency in selection of the proposed criterions is theoretically proven in the model selection task of two-class classification problem.

Key words: model selection, generalization error, blocked 3×2 cross-validation, variance-regularized