组块3×2交叉验证的F1度量的方差分析

doi:10.3778/j.issn.1673-9418.1603082

计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (8): 1176-1183.DOI: 10.3778/j.issn.1673-9418.1603082

组块3×2交叉验证的F1度量的方差分析

杨柳1+，王钰2

1. 山西财经大学应用数学学院，太原 030006
2. 山西大学软件学院，太原 030006

出版日期:2016-08-01 发布日期:2016-08-09

Analysis of Variance of F1 Measure Based on Blocked 3×2 Cross Validation

YANG Liu1+, WANG Yu2

1. School of Applied Mathematics, Shanxi University of Finance & Economics, Taiyuan 030006, China
2. School of Software, Shanxi University, Taiyuan 030006, China

Online:2016-08-01 Published:2016-08-09

摘要/Abstract

摘要： 在统计机器学习的研究中，研究者常常通过定量实验来对照基于交叉验证的分类算法的F1度量，为了得到统计可信的结论，估计它的不确定性是非常重要的。特别地，组块[3×2]交叉验证方法被大量理论和实验验证了它的性能优于诸如标准K折交叉验证的其他常用交叉验证方法。为此，理论上研究了基于组块[3×2]交叉验证的F1度量的方差。方差的结构表明它由块方差、块内协方差和块间协方差三部分组成，从而说明了广泛使用的样本方差估计可能严重地低估或高估真实的方差。通过条形图方法在模拟和真实数据上进行实验，验证了上述理论结果，实验结果表明块内、块间协方差和块方差是同阶的，块内和块间相关性是不可忽略的。

关键词: F1度量, 交叉验证, 方差, 分类算法, 模拟实验

Abstract: In the research on statistical machine learning, researchers often perform quantitative experiments to compare F1 measure of classification algorithms based on cross validation. In order to obtain statistically convincing conclusion, it is very important to estimate the uncertainty of F1 measure. In particular, the blocked 3×2 cross validation is demonstrated that its performance is superior to other cross validation methods such as the standard K-fold cross validation by theory and experiments. Thus, this paper studies theoretically the variance of F1 measure based on blocked 3×2 cross validation. The structure of variance shows that it is composed of three parts: block variance, within-block covariance and between-blocks covariance, which also implies that the commonly used sample variance may grossly underestimate or overestimate the real variance. The above theoretical results are validated by the experiments in simulated and real data sets through bar chart method. The experimental results show that the within-block covariance and between-blocks covariance are of same order as the block variance. The within-block and between-blocks correlations can not be neglected.

Key words: F1 measure, cross validation, variance, classification algorithm, simulated experiment

杨柳，王钰. 组块3×2交叉验证的F1度量的方差分析[J]. 计算机科学与探索, 2016, 10(8): 1176-1183.

YANG Liu, WANG Yu. Analysis of Variance of F1 Measure Based on Blocked 3×2 Cross Validation[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(8): 1176-1183.

[1]	陈兴国，徐修颖，陈康扬，杨光. 基于CMAES集成学习方法的地表水质分类[J]. 计算机科学与探索, 2020, 14(3): 426-436.
[2]	房立超，王钰，杨杏丽，李济洪. 方差正则化的分类模型选择准则[J]. 计算机科学与探索, 2019, 13(3): 457-467.
[3]	巢秀琴，李炜. 人工蜂群算法优化的特征选择方法[J]. 计算机科学与探索, 2019, 13(2): 300-309.
[4]	阮传扬，韩莉娜. 考虑区间元素个数的区间犹豫模糊决策方法[J]. 计算机科学与探索, 2018, 12(9): 1513-1521.
[5]	王建飞，亢良伊，刘杰，叶丹. 分布式随机方差消减梯度下降算法topkSVRG[J]. 计算机科学与探索, 2018, 12(7): 1047-1054.
[6]	吴煜，杨爱萍，章宦记，王建，刘立. 基于黎曼与巴氏距离的脑磁图信号分类方法[J]. 计算机科学与探索, 2017, 11(5): 776-784.
[7]	康健，吴英杰，黄泗勇，陈鸿，孙岚. 异方差加噪下的差分隐私直方图发布算法[J]. 计算机科学与探索, 2016, 10(6): 786-798.
[8]	蔡宇浩，梁永全，樊建聪，李璇，刘文华. 加权局部方差优化初始簇中心的K-means算法[J]. 计算机科学与探索, 2016, 10(5): 732-741.
[9]	陆莉莉，张永潘，谈海宇，季一木. 大数据分类挖掘算法及其概念漂移应用研究[J]. 计算机科学与探索, 2016, 10(12): 1683-1692.
[10]	张新明，涂强，尹欣欣. 混合迁移的高效BBO算法及其在图像分割中的应用[J]. 计算机科学与探索, 2016, 10(10): 1459-1468.
[11]	谢娟英，高瑞. 方差优化初始中心的K-medoids聚类算法[J]. 计算机科学与探索, 2015, 9(8): 973-984.
[12]	刘志强，顾荣，袁春风，黄宜华. 基于SparkR的分类算法并行化研究[J]. 计算机科学与探索, 2015, 9(11): 1281-1294.
[13]	张汴卡，相艳，易三莉，马磊，邢正伟，贺建峰. MRA脑血管图像局部Otsu分割研究[J]. 计算机科学与探索, 2013, 7(11): 1026-1032.
[14]	高聪，李凡长，沈程. 李群核学习算法研究[J]. 计算机科学与探索, 2012, 6(11): 1026-1038.
[15]	李宽+ ; 殷建平;李永; 詹宇斌 . 面向小样本库的全局Gabor滤波人脸识别*[J]. 计算机科学与探索, 2010, 4(5): 420-425.

组块3×2交叉验证的F1度量的方差分析

Analysis of Variance of F1 Measure Based on Blocked 3×2 Cross Validation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics