计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (3): 521-528.DOI: 10.3778/j.issn.1673-9418.1805068

• 理论与算法 • 上一篇    下一篇

粗糙集的Mallow's Cp选择算法

杨贵军,于  洋+   

  1. 天津财经大学 统计学院,天津 300222
  • 出版日期:2019-03-01 发布日期:2019-03-11

Mallow’s Cp Selection Algorithm for Rough Set

YANG Guijun, YU Yang+   

  1. School of Statistics, Tianjin University of Finance and Economics, Tianjin 300222, China
  • Online:2019-03-01 Published:2019-03-11

摘要: 粗糙集选择是粗糙集实证研究中的关键步骤。目前常用的粗糙集择优标准是误判率。考虑到误判率准则未考察粗糙集的复杂度,存在过拟合风险,在测试集中误判率小的粗糙集不一定具有最强的泛化能力,引入Mallow’s Cp准则作为一种新粗糙集选择标准。粗糙集的Mallow’s Cp选择算法通过Logistic模型将非线性的粗糙集分类规则表达为线性形式,Logistic模型的Cp值作为粗糙集的Cp值,根据Cp值进行粗糙集择优。实际应用显示,粗糙集的Mallow’s Cp选择算法能够筛选出泛化能力强的粗糙集,相较误判率准则选出泛化能力强的粗糙集的频率更高。特别当多个粗糙集的误判率差异小时,新算法更可能选出泛化能力强的粗糙集。粗糙集的Mallow’s Cp选择算法兼顾了粗糙规则的分类准确性与复杂度,能够更好地选择泛化能力强的粗糙集。

关键词: Mallow&rsquo, s Cp准则, Logistic模型, 模型选择, 粗糙集, 泛化能力

Abstract: Rough set selection is a key step in empirical research of rough sets. Misclassification rate is often used as an optimal criterion of rough set evaluation. In view that the misclassification rate criterion does not consider the complexity of the rough set, thus there is over-fitting risk, and the rough set with the least misclassification rate in a test set does not always have the best generalization ability, the Mallow’s Cp criterion is introduced as a new rough set selection criterion. The Mallow’s Cp selection algorithm for rough set expresses the nonlinear rough set classification rules as linear form by Logistic model, the Cp value of the rough set is defined as the Cp value of the Logistic model, and rough set is selected according to Cp value. Empirical research results show that the Mallow’s Cp selection algorithm for rough set can choose out rough set with better generalization ability, and the selection frequency of rough set with best generalization ability is higher than misclassification rate criterion. Especially when there is small difference of misclassification rate among rough sets, new approach is more likely to choose rough set with the best generalization ability than misclassification criterion. The Mallow’s Cp selection algorithm for rough set combines the classification accuracy and complexity of rough rules and is better at choosing rough set with the best generalization ability.

Key words: Mallow’s Cp criterion, Logistic model, model selection, rough set, generalization ability