构造性覆盖算法的SMOTE过采样方法

doi:10.3778/j.issn.1673-9418.1905091

计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (6): 975-984.DOI: 10.3778/j.issn.1673-9418.1905091

构造性覆盖算法的SMOTE过采样方法

严远亭，朱原玮，吴增宝，张以文，张燕平

安徽大学计算机科学与技术学院，合肥 230601

出版日期:2020-06-01 发布日期:2020-06-04

Constructive Covering Algorithm-Based SMOTE Over-sampling Method

YAN Yuanting, ZHU Yuanwei, WU Zengbao, ZHANG Yiwen, ZHANG Yanping

School of Computer Science and Technology, Anhui University, Hefei 230601, China

Online:2020-06-01 Published:2020-06-04

摘要/Abstract

摘要：

如何提高对少数类样本的识别能力是不平衡数据分类中的一个研究热点。合成少数类过采样技术（SMOTE）是解决此类问题的代表性方法之一。近年来，不少研究者对SMOTE做出了一些改进，较好地提高了该方法的性能。然而，如何有效地选取典型少数类样本进行过采样仍然是一个值得研究的问题。此外，被孤立的少数样本在提高模型性能方面的潜在能力也没有得到足够的重视。针对上述问题，提出了基于构造性覆盖算法（CCA）的过采样技术CMOTE。CMOTE提供了两种不同策略下选择关键样本的方法：基于覆盖内样本个数的方法与基于覆盖密度的方法。在12个典型的不平衡数据集上验证CMOTE算法的性能。实验结果表明，CMOTE算法在总体上优于对比方法，并且通过强化关键样本对模型性能的影响增强了模型的泛化能力。

关键词: 不平衡数据, 过采样技术, 合成少数类过采样技术（SMOTE）, 构造性覆盖算法（CCA）

Abstract:

Improving the recognition ability of minority samples is a crucial research hotspot of imbalance data classification. Synthetic minority over-sampling technique (SMOTE) is a typical representative technique to solve such problem. In recent years, researchers have made some improvements on SMOTE, and the performance of this method is improved. However, how to select the most informative minority samples efficiently for over-sampling still needs to be improved. Moreover, the potential ability of isolate minority samples in improving model performance does??t get enough attention. In this paper, an over-sampling technique based on constructive covering algorithm (CCA) and SMOTE (namely CMOTE) is proposed to solve the above problems. CMOTE provides two CCA based strategies (one is based on the number of samples in cover and one is based on cover density) in selecting key samples. Numerical experiments on 12 typical imbalance datasets are conducted to verify the performance of CMOTE. Experimental results show that CMOTE is generally superior to the algorithms compared. The generalization ability of the model is enhanced by strengthening the impact of critical samples on model performance.

Key words: imbalanced data, over-sampling technique, synthetic minority over-sampling technique (SMOTE), cons-tructive covering algorithm (CCA)

严远亭，朱原玮，吴增宝，张以文，张燕平. 构造性覆盖算法的SMOTE过采样方法[J]. 计算机科学与探索, 2020, 14(6): 975-984.

YAN Yuanting, ZHU Yuanwei, WU Zengbao, ZHANG Yiwen, ZHANG Yanping. Constructive Covering Algorithm-Based SMOTE Over-sampling Method[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(6): 975-984.

[1]	商显震，韩萌，孙毓忠，孙宇宁，陈旭，胡满满，梅御东. 融合生成对抗网络和朴素贝叶斯皮肤病诊断方法[J]. 计算机科学与探索, 2019, 13(6): 1005-1015.
[2]	吴艺凡，梁吉业，王俊红. 基于混合采样的非平衡数据分类算法[J]. 计算机科学与探索, 2019, 13(2): 342-349.
[3]	么素素，王宝亮，侯永宏. 绝对不平衡样本分类的集成迁移学习算法[J]. 计算机科学与探索, 2018, 12(7): 1145-1153.
[4]	王超学，张涛，马春森. 面向不平衡数据集的改进型SMOTE算法[J]. 计算机科学与探索, 2014, 8(6): 727-734.
[5]	张燕平，邹慧锦，邢航，赵姝. CCA三支决策模型的边界域样本处理[J]. 计算机科学与探索, 2014, 8(5): 593-600.

构造性覆盖算法的SMOTE过采样方法

Constructive Covering Algorithm-Based SMOTE Over-sampling Method

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics