利用置信度重取样的SemiBoost-CR分类模型

计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (11): 1048-1056.

• 学术研究 • 上一篇

利用置信度重取样的SemiBoost-CR分类模型

唐焕玲, 鲁明羽

1. 山东工商学院计算机科学与技术学院, 山东烟台 264005
2. 大连海事大学信息科学技术学院, 辽宁大连 116026

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-11-01 发布日期:2011-11-01

Advanced SemiBoost-CR Categorization Model Utilizing Confidence-Based Resampling

TANG Huanling, LU Mingyu

1. School of Computer Science and Technology, Shandong Institute of Business and Technology, Yantai, Shandong 264005, China 2. School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning 116026, China

Received:1900-01-01 Revised:1900-01-01 Online:2011-11-01 Published:2011-11-01

摘要/Abstract

摘要： 结合半监督学习和集成学习方法, 提出了一种基于置信度重取样的SemiBoost-CR分类模型。给出了基于标注近邻与未标注近邻的置信度计算公式, 按照置信度重采样, 不仅选取一定比例置信度较高的未标注样本, 而且选取一定比例置信度较低的未标注样本, 分别以不同的策略加入到已标注的训练样本集。引入置信度高的未标注样本, 用以提高基分类器的正确性(accuracy); 而引入置信度低的未标注样本, 目的则是进一步增加基分类器间的差异性(diversity)。对比实验表明, SemiBoost-CR分类模型能够有效提升Naive Bayesian文本分类器的性能。

关键词: boosting, 半监督分类, 朴素贝叶斯, 置信度, 重取样

Abstract: This paper proposes SemiBoost-CR, an enhanced categorization model which utilizing the confidence- based resampling technique and incorporating semi-supervised learning with ensemble learning. The confidence score is derived from the nearer labeled neighbors and unlabeled neighbors of the example. According to the
confidence-based resampling, not only the unlabeled examples with higher confidence score, but also the unlabeled ones with lower confidence score are selected and added to the labeled training set. The accuracy of the base classi-fier is to be improved by introducing the unlabeled data with higher confidence; the diversity among the base classi-fiers is further increased by introducing the unlabeled data with lower confidence. Experimental results show that SemiBoost-CR can boost the performance of Naive Bayesian text categorization.

Key words: boosting, semi-supervised categorization, Naive Bayesian, confidence, resampling

唐焕玲, 鲁明羽. 利用置信度重取样的SemiBoost-CR分类模型[J]. 计算机科学与探索, 2011, 5(11): 1048-1056.

TANG Huanling, LU Mingyu. Advanced SemiBoost-CR Categorization Model Utilizing Confidence-Based Resampling[J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(11): 1048-1056.

[1]	陈兴国，徐修颖，陈康扬，杨光. 基于CMAES集成学习方法的地表水质分类[J]. 计算机科学与探索, 2020, 14(3): 426-436.
[2]	庞俊，黄恒，张寿，舒智梁，赵宇海. DE-ELM-SSC+半监督分类算法[J]. 计算机科学与探索, 2020, 14(12): 2014-2027.
[3]	商显震，韩萌，孙毓忠，孙宇宁，陈旭，胡满满，梅御东. 融合生成对抗网络和朴素贝叶斯皮肤病诊断方法[J]. 计算机科学与探索, 2019, 13(6): 1005-1015.
[4]	王立亚，张春英，刘保相. 带参数区间关联规则挖掘算法与应用[J]. 计算机科学与探索, 2016, 10(11): 1546-1554.
[5]	阳爱民，林江豪，周咏梅. 中文文本情感词典构建方法[J]. 计算机科学与探索, 2013, 7(11): 1033-1039.
[6]	江凯，高阳. 并行化的半监督朴素贝叶斯分类算法[J]. 计算机科学与探索, 2012, 6(10): 912-918.
[7]	牛罡, 罗爱宝, 商琳. 半监督文本分类综述[J]. 计算机科学与探索, 2011, 5(4): 313-323.
[8]	王池社1,2+ ,程家兴1 ,苏守宝1 ,徐栋哲3 . 基于朴素贝叶斯分类器的蛋白质界面残基识别[J]. 计算机科学与探索, 2009, 3(3): 293-302.
[9]	凌萍1,2 + ,周春光1 . SVM置信度在线评估以及决策改进[J]. 计算机科学与探索, 2008, 2(2): 192-197.

利用置信度重取样的SemiBoost-CR分类模型

Advanced SemiBoost-CR Categorization Model Utilizing Confidence-Based Resampling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 9

编辑推荐

Metrics