Survey of Research on SMOTE Type Algorithms

doi:10.3778/j.issn.1673-9418.2309079

Abstract

Abstract: Synthetic minority oversampling technique (SMOTE) has become one of the mainstream methods for dealing with unbalanced data due to its ability to effectively deal with minority samples, and many SMOTE improvement algorithms have been proposed, but very little research existing considers popular algorithmic-level improvement methods. Therefore a more comprehensive analysis of existing SMOTE class algorithms is provided. Firstly, the basic principles of the SMOTE method are elaborated in detail, and then the SMOTE class algorithms are systematically analyzed mainly from the two levels of data level and algorithmic level, and the new ideas of the hybrid improvement of data level and algorithmic level are introduced. Data-level improvement is to balance the data distribution by deleting or adding data through different operations during preprocessing; algorithmic-level improvement will not change the data distribution, and mainly strengthens the focus on minority samples by modifying or creating algorithms. Comparison between these two kinds of methods shows that, data-level methods are less restricted in their application, and algorithmic-level improvements generally have higher algorithmic robustness. In order to provide more comprehensive basic research material on SMOTE class algorithms, this paper finally lists the commonly used datasets, evaluation metrics, and gives ideas of research in the future to better cope with unbalanced data problem.

Key words: unbalanced data, synthetic minority oversampling technique (SMOTE), oversampling, supervised learning

摘要： 合成少数类过采样技术（SMOTE）因能有效处理少数类样本已成为处理不平衡数据的主流方法之一，而且许多SMOTE改进算法已被提出，但目前已有的调研极少考虑到流行的算法级改进方法。因此对现有SMOTE类算法进行更全面的分析与总结。首先详细阐述了SMOTE方法的基本原理，然后主要从数据级、算法级两个层面系统性地梳理分析SMOTE类算法，并介绍数据级和算法级混合改进的新思路。数据级改进是在预处理时通过不同操作删除或添加数据来平衡数据分布；算法级改进不会改变数据分布，主要通过修改或创建算法来加强对少数类样本的关注度。二者相比，数据级方法应用受限更少，算法级改进的算法鲁棒性普遍更高。为了更全面地提供SMOTE类算法的基础研究材料，最后列出常用数据集、评价指标，给出未来可能尝试进行的研究思路，以更好地应对不平衡数据问题。

关键词: 不平衡数据, 合成少数类过采样技术（SMOTE）, 过采样, 监督学习

WANG Xiaoxia, LI Leixiao, LIN Hao. Survey of Research on SMOTE Type Algorithms[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(5): 1135-1159.

王晓霞, 李雷孝, 林浩. SMOTE类算法研究综述[J]. 计算机科学与探索, 2024, 18(5): 1135-1159.

References

[1] 石洪波, 陈雨文, 陈鑫. SMOTE 过采样及其改进算法研究综述[J]. 智能系统学报, 2019, 14(6): 1073-1083.
SHI H B, CHEN Y W, CHEN X. Summary of research on SMOTE oversampling and its improved algorithms[J]. CAAI Transactions on Intelligent Systems, 2019, 14(6): 1073-1083.
[2] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[3] 高欣, 纪维佳, 赵兵, 等. 不平衡数据集下基于CVAE-CNN模型的智能电表故障多分类方法[J]. 电网技术, 2021, 45(8): 3052-3060.
GAO X, JI W J, ZHAO B, et al. Multi-classification method of smart meter fault types based on CVAE-CNN model under imbalanced dataset[J]. Power System Technology, 2021, 45(8): 3052-3060.
[4] ARAFA A, EL-FISHAWV N, BADAWY M, et al. RN-SMOTE: reduced noise SMOTE based on DBSCAN for enhancing imbalanced data classification[J]. Journal of King Saud University(Computer and Information Sciences), 2022, 34(8): 5059-5074.
[5] ?ZDEMIR A, POLAT K, ALHUDHAIF A. Classification of imbalanced hyperspectral images using SMOTE-based deep learning methods[J]. Expert Systems with Applications, 2021, 178: 114986.
[6] ZHANG C, ZHOU Y, GUO J, et al. Research on classification method of high-dimensional class-imbalanced datasets based on SVM[J]. International Journal of Machine Learning and Cybernetics, 2019, 10: 1765-1778.
[7] MONIZ N, BRANCO P, TORGO L. Resampling strategies for imbalanced time series[C]//Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics. Piscataway: IEEE, 2016: 282-291.
[8] CAMACHO L, DOUZAS G, BACAO F. Geometric SMOTE for regression[J]. Expert Systems with Applications, 2022, 193: 116387.
[9] DUBEY H, PUDI V. Class based weighted k-nearest neighbor over imbalance dataset[C]//Advances in Knowledge Discovery and Data Mining: Proceedings of the 17th Pacific-Asia Conference, Gold Coast, Apr 14-17, 2013. Berlin, Heidelberg: Springer, 2013: 305-316.
[10] 王乐, 韩萌, 李小娟, 等. 不平衡数据集分类方法综述[J]. 计算机工程与应用, 2021, 57(22): 42-52.
WANG L, HAN M, LI X J, et al. Review of classification methods for unbalanced data sets[J]. Computer Engineering and Applications, 2021, 57(22): 42-52.
[11] 周玉, 孙红玉, 房倩, 等. 不平衡数据集分类方法研究综述[J]. 计算机应用研究, 2022, 39(6): 1615-1621.
ZHOU Y, SUN H Y, FANG Q, et al. Review of imbalanced data classification methods[J]. Application Research of Computers, 2022, 39(6): 1615-1621.
[12] WANG S, DAI Y, SHEN J, et al. Research on expansion and classification of imbalanced data based on SMOTE algorithm[J]. Scientific Reports, 2021, 11(1): 24039.
[13] 刘定祥, 乔少杰, 张永清, 等. 不平衡分类的数据采样方法综述[J]. 重庆理工大学学报(自然科学), 2019, 33(7): 11.
LIU D X, QIAO S J, ZHANG Y Q, et al. A survey on data sampling methods in imbalance classification[J]. Journal of Chongqing University of Technology (Natural Science), 2019, 33(7): 11.
[14] LU Y, CHENG Y M, TANG Y Y. Bayes imbalance impact index: a measure of class imbalanced data set for classification problem[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(9): 3525-3539.
[15] KIM K H, SOHN S Y. Hybrid neural network with cost-sensitive support vector machine for class-imbalanced multi-modal data[J]. Neural Networks, 2020, 130. DOI: 10.1016/j.neunet.2020.06.026.
[16] ZHANG C, TAN K C, LI H Z, et al. A cost-sensitive deep belief network for imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(1): 109-122.
[17] CHEN G, ZHANG Y, FU P, et al. Imbalanced data classification based on scaling kernel-based support vector machine[J]. Neural Computing & Applications, 2014. DOI:10.1007/s00521-014-1584-2.
[18] CAO B, LIU Y Q, HOU C Y, et al. Expediting the accuracy-improving process of SVMs for class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 33(11): 3550-3567.
[19] DE AIDA H G, GONZALO C G, NICOLAS G P. Ensembles of feature selectors for dealing with class-imbalanced datasets: a proposal and comparative study[J]. Information Sciences, 2020, 540: 89-116.
[20] NEKOOEIMEHR I, LAI-YUEN S K. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets[J]. Expert Systems with Applications, 2016, 46: 405-416.
[21] YI X, XU Y, HU Q, et al. ASN-SMOTE: a synthetic minority oversampling method with adaptive qualified synthesizer selection[J]. Complex & Intelligent Systems, 2022, 8: 2247-2272.
[22] YI H, JIANG Q, YAN X, et al. Imbalanced classification based on minority clustering synthetic minority oversampling technique with wind turbine fault detection application[J]. IEEE Transactions on Industrial Informatics, 2020, 17(9): 5867-5875.
[23] ELREEDY D, ATIYA A F. A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance[J]. Information Sciences, 2019, 505: 32-64.
[24] SOLTANZADEH P, HASHEMZADEH M. RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem[J]. Information Sciences, 2020, 542. DOI: 10.1016/j.ins.2020.07.014.
[25] BARUA S, ISLAM M, YAO X, et al. MWMOTE—majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 26(2): 405-425.
[26] HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]//Proceedings of the 2005 International Conference on Intelligent Computing. Berlin, Heidelberg: Springer, 2005: 878-887.
[27] CHEN Y, PEDRYCZ W, YANG J. A new boundary-degree-based oversampling method for imbalanced data[J]. Applied Intelligence, 2023, 53(22): 26518-26541.
[28] LIANG X, JIANG A, LI T, et al. LR-SMOTE—an improved unbalanced data set oversampling based on K-means and SVM[J]. Knowledge-Based Systems, 2020, 196: 105845.
[29] BUNKHUMPORPAT C, SINAPIROMSARAN K, LURSINSAP C. Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem[C]//Advances in Knowledge Discovery and Data Mining: Proceedings of the 13th Pacific-Asia Conference, Bangkok, Apr 27-30, 2009. Berlin, Heidelberg: Springer, 2009: 475-482.
[30] LI J, ZHU Q, WU Q, et al. A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors[J]. Information Sciences, 2021, 565: 438-455.
[31] SHEN C, ZHANG H, MENG S, et al. Augmented data driven self-attention deep learning method for imbalanced fault diagnosis of the HVAC chiller[J]. Engineering Applications of Artificial Intelligence, 2023, 117: 105540.
[32] DOUZAS G, BACAO F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J]. Information Sciences, 2019, 501: 118-135.
[33] DOUZAS G, RAUCH R, BACAO F. G-SOMO: an over-sampling approach based on self-organized maps and geometric SMOTE[J]. Expert Systems with Applications, 2021, 183(2): 115230.
[34] YUAN X, CHEN S, ZHOU H, et al. CHSMOTE: convex hull-based synthetic minority oversampling technique for alleviating the class imbalance problem[J]. Information Sciences, 2023, 623: 324-341.
[35] 魏迎梅, 王涌, 吴泉源, 等. 碰撞检测中的固定方向凸包包围盒的研究[J]. 软件学报, 2001, 12(7): 1056-1063.
WEI Y M, WANG Y, WU Q Y, et al. Research on fixed direction hull bounding volume in collision detection[J]. Journal of Software, 2001, 12(7): 1056-1063.
[36] HE H, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328.
[37] WANG X, GONG J, SONG Y, et al. Adaptively weighted three-way decision oversampling: a cluster imbalanced-ratio based approach[J]. Applied Intelligence, 2023, 53(1): 312-335.
[38] WEI J N, HUANG H S, YAO L G, et al. NI-MWMOTE: an improving noise-immunity majority weighted minority over-sampling technique for imbalanced classification problems[J]. Expert Systems with Applications, 2020, 158: 113504.
[39] 盛凯, 刘忠, 周德超, 等. 面向不平衡分类的IDP-SMOTE重采样算法[J]. 计算机应用研究, 2019, 36(1): 115-118.
SHENG K, LIU Z, ZHOU D C, et al. IDP-SMOTE resampling algorithm for imbalanced classification[J]. Application Research of Computers, 2019, 36(1): 115-118.
[40] VO M T, NGUYEN T, VO H A, et al. Noise-adaptive synthetic oversampling technique[J]. Applied Intelligence, 2021, 51(11): 7827-7836.
[41] DAI F, SONG Y, SI W, et al. Improved CBSO: a distributed fuzzy-based adaptive synthetic oversampling algorithm for imbalanced judicial data[J]. Information Sciences, 2021,569: 70-89.
[42] LENG Q, GUO J, JIAO E, et al. NanBDOS: adaptive and parameter-free borderline oversampling via natural neighbor search for class-imbalance learning[J]. Knowledge-Based Systems, 2023, 274: 110665.
[43] KOSOLWATTANA T, LIU C, HU R, et al. A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare[J]. BioData Mining, 2023, 16(1): 15.
[44] WANG X, XU J, ZENG T, et al. Local distribution-based adaptive minority oversampling for imbalanced data classification[J]. Neurocomputing, 2021, 422: 200-213.
[45] YAN Y, JIANG Y, ZHENG Z, et al. LDAS: local density-based adaptive sampling for imbalanced data classification[J]. Expert Systems with Applications, 2022, 191: 116213.
[46] PAN T, ZHAO J, WU W, et al. Learning imbalanced data-sets based on SMOTE and Gaussian distribution[J]. Information Sciences, 2020, 512: 1214-1233.
[47] PRUENGKARN R, WONG K W, FUNG C C. Multiclass imbalanced classification using fuzzy C-mean and SMOTE with fuzzy support vector machine[C]//Proceedings of the 24th International Conference on Neural Information Processing, Guangzhou, Nov 14-18, 2017. Cham: Springer, 2017: 67-75.
[48] 楼晓俊, 孙雨轩, 刘海涛. 聚类边界过采样不平衡数据分类方法[J]. 浙江大学学报(工学版), 2013(6): 944-950.
LOU X J, SUN Y X, LIU H T. Clustering boundary over-sampling classification method for imbalanced data sets[J]. Journal of Zhejiang University (Engineering Science), 2013(6): 944-950.
[49] MOUTAOUAKIL K E, ROUDANI M, OUISSARI A E. Optimal entropy genetic fuzzy-C-means SMOTE (OEGFCM-SMOTE)[J]. Knowledge-Based Systems, 2023, 262: 110235.
[50] HUANG X, ZHANG C Z, YUAN J. Predicting extreme financial risks on imbalanced dataset: a combined kernel FCM and kernel SMOTE based SVM classifier[J]. Computational Economics, 2020, 56: 187-216.
[51] MA L, FAN S. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J]. BMC Bioinformatics, 2017, 18(1): 1-18.
[52] WEI J, HUANG H, YAO L, et al. IA-SUWO: an improving adaptive semi-unsupervised weighted oversampling for imbalanced classification problems[J]. Knowledge-Based Systems, 2020, 203: 106116.
[53] BUNKHUMPORPAT C S K, LURSINSAP C. DBSMOTE: density-based synthetic minority over-sampling technique[J]. Applied Intelligence, 2012, 36(3): 664-684.
[54] TAO Y, ZHANG Y, JIANG B. DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction[J]. BMC Medical Genomics, 2020, 13. DOI: 10.1186/s12920-020-00781-2.
[55] MENG D, LI Y. An imbalanced learning method by combining SMOTE with center offset factor[J]. Applied Soft Computing, 2022, 120: 108618.
[56] SáEZ J A, LUENGO J, STEFANOWSKI J, et al. SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J]. Information Sciences, 2015, 291: 184-203.
[57] RADWAN A M. Enhancing prediction on imbalance data by thresholding technique with noise filtering[C]//Proceedings of the 8th International Conference on Information Technology. Piscataway: IEEE, 2017: 399-404.
[58] ZHAO X, DERONG S, YUE K, et al. A synthetic minority oversampling technique based on Gaussian mixture model filtering for imbalanced data classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3740-3753.
[59] BISPO A, PRUDENCIO R, VERAS D. Instance selection and class balancing techniques for cross project defect prediction[C]//Proceedings of the 7th Brazilian Conference on Intelligent Systems. Piscataway: IEEE, 2018: 552-557.
[60] ZHANG J, NG W Y. Stochastic sensitivity measure-based noise filtering and oversampling method for imbalanced classification problems[C]//Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2018: 403-408.
[61] LIN M, ZHU X, HUA T, et al. Detection of ionospheric scintillation based on XGBoost model improved by SMOTE-ENN technique[J]. Remote Sensing, 2021, 13(13): 2577.
[62] GUAN H, ZHANG Y, XIAN M, et al. SMOTE-WENN: solving class imbalance and small sample problems by oversampling and distance scaling[J]. Applied Intelligence, 2021, 51: 1394-1409.
[63] PURI A, KUMAR G M. Improved hybrid bag-boost ensemble with K-means-SMOTE-ENN technique for handling noisy class imbalanced data[J]. The Computer Journal, 2022, 65(1): 124-138.
[64] ZHAO S, MENG J, WEKESA J S, et al. Identification of small open reading frames in plant lncRNA using class-imbalance learning[J]. Computers in Biology and Medicine, 2023, 157: 106773.
[65] LI J, ZHU Q, WU Q, et al. SMOTE-NaN-DE: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution[J]. Knowledge-Based Systems, 2021, 223: 107056.
[66] BELLINGER C, DRUMMOND C, JAPKOWICZ N. Beyond the boundaries of SMOTE: a framework for manifold-based synthetically oversampling[C]//Proceedings of the 2016 European Conference on Machine Learning and Knowledge Discovery in Databases, Riva del Garda, Sep 19-23, 2016. Cham: Springer, 2016: 248-263.
[67] HANIFAH F S, WIJAYANTO H, KURNIA A. SMOTE bagging algorithm for imbalanced dataset in logistic regression analysis (case: credit of bank X)[J]. Applied Mathematical Sciences, 2015, 9: 6857-6865.
[68] WANG J, YUN B, HUANG P, et al. Applying threshold SMOTE algoritwith attribute bagging to imbalanced data-sets[C]//Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology, Halifax, Oct 11-14, 2013. Berlin, Heidelberg: Springer, 2013: 221-228.
[69] BRANCO P, TORGO L, RIBEIRO R P. REBAGG: resampled bagging for imbalanced regression[C]//Proceedings of the 2nd International Workshop on Learning with Imbalanced Domains: Theory and Applications, Dublin, Sep 10, 2018: 67-81.
[70] WANG Q, LUO Z H, HUANG J C, et al. A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM[J]. Computational Intelligence & Neuroscience, 2017: 1827016.
[71] SUN J, LANG J, FUJITA H, et al. Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates[J]. Information Sciences, 2018, 425: 76-91.
[72] LV M, REN Y, CHEN Y. Research on imbalanced data: based on SMOTE-AdaBoost algorithm[C]//Proceedings of the 3rd International Conference on Electronic Information Technology and Computer Engineering. Piscataway: IEEE, 2019: 1165-1170.
[73] LLEBERI E, SUN Y, WANG Z. Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost[J]. IEEE Access, 2021, 9: 165286-165294.
[74] DING H, WEI B, GU Z R, et al. KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling[J]. Multimedia Tools and Applications, 2020, 79: 14871-14888.
[75] CHEN Z, DUAN J, KKANG L, et al. A hybrid data-level ensemble to enable learning from highly imbalanced dataset[J]. Information Sciences, 2021, 554: 157-176.
[76] GAO X, REN B, ZHANG H, et al. An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling[J]. Expert Systems with Applications, 2020, 160: 113660.
[77] 童莹萍, 冯伟, 宋怡佳, 等. 面向不平衡高光谱遥感分类的SMOTE和旋转森林动态集成算法[J]. 遥感学报, 2022, 26(11): 2369-2381.
TONG Y P, FENG W, SONG Y J, et al. Dynamic ensemble algorithm of SMOTE and rotation forest for imbalanced hyperspectral remote sensing classification[J]. National Remote Sensing Bulletin, 2022, 26(11): 2369-2381.
[78] SAGLAM F, CENGIZ M A. A novel SMOTE-based resampling technique trough noise detection and the boosting procedure[J]. Expert Systems with Applications, 2022, 200: 117023.
[79] ZHANG A, YU H, ZHOU S, et al. Instance weighted SMOTE by indirectly exploring the data distribution[J]. Knowledge-Based Systems, 2022, 249: 108919.
[80] 陈圣灵, 沈思淇, 李东升. 基于样本权重更新的不平衡数据集成学习方法[J]. 计算机科学, 2018, 45(7): 31-37.
CHEN S L, SHEN S Q, LI D S. Ensemble learning method for imbalanced data based on sample weight updating[J]. Computer Science, 2018, 45(7): 31-37.
[81] ABUQADDOM I, HUDAIB A. Cost-sensitive learner on hybrid SMOTE-ensemble approach to predict software defects[M]//Computational and Statistical Methods in Intelligent Systems. Cham: Springer, 2019: 12-21.
[82] SEBASTIAN M, CARLA V, ALBERTO F, et al. FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification[J]. Pattern Recognition, 2022, 124: 108511.
[83] NGUYEN H M, COOPER E W, KAMEI K. Borderline over-sampling for imbalanced data classification[J]. International Journal of Knowledge Engineering and Soft Data Paradigms, 2011, 3(1): 4-21.
[84] TAO X, ZHENG Y, CHEN W, et al. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning[J]. Information Sciences, 2022, 588: 13-51.
[85] GONG J. A novel oversampling technique for imbalanced learning based on SMOTE and genetic algorithm[C]//Proceedings of the 28th International Conference on Neural Information Processing, Sanur, Dec 8-12, 2021. Cham: Springer, 2021: 201-212.
[86] ZHANG A, YU H, HUAN Z, et al. SMOTE-RKNN: a hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors[J]. Information Sciences, 2022, 595: 70-88.
[87] REVATHI M, RAMYACHITRA D. A modified borderline smote with noise reduction in imbalanced datasets[J]. Wireless Personal Communications, 2021, 121: 1659-1680.
[88] SHON H S, BATBAATAR E, KIM K O, et al. Classification of kidney cancer data using cost-sensitive hybrid deep learning approach[J]. Symmetry, 2020, 12(1): 154.
[89] ZHANG S. Cost-sensitive KNN classification[J]. Neurocomputing, 2020, 391: 234-242.
[90] CAO C, CUI Z, WANG L, et al. Cost-sensitive awareness-based SAR automatic target recognition for imbalanced data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-16.
[91] MATHEW J, PANG C K, LUO M, et al. Classification of imbalanced data by oversampling in kernel space of support vector machines[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(9): 4065-4076.
[92] REN J, WANG Y, CHEUNG Y, et al. Grouping-based over-sampling in kernel space for imbalanced data classification[J]. Pattern Recognition, 2023, 133: 108992.
[93] WANG Z, LIU T, WU X, et al. A diagnosis method for imbalanced bearing data based on improved smote model combined with CNN-AM[J]. Journal of Computational Design and Engineering, 2023, 10(5): 1930-1940.
[94] SO B, BOUCHER J P, VALDEZ E A. Cost-sensitive multi-class adaboost for understanding driving behavior based on telematics[J]. ASTIN Bulletin: The Journal of the IAA, 2021, 51(3): 719-751.
[95] ZHOU C, LIU B, WANG S. CMO-SMOTE: misclassification cost minimization oriented synthetic minority oversampling technique for imbalanced learning[C]//Proceedings of the 8th International Conference on Intelligent Human-Machine Systems and Cybernetics. Piscataway: IEEE, 2016: 353-358.
[96] RAGHUWANSHI B S. SMOTE based class-specific extreme learning machine for imbalanced learning[J]. Knowledge-Based Systems, 2020, 187. DOI: 10.1016/j.knosys.2019.06.022.
[97] DELGADO R J. Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning[J]. Scientific Reports [2023-08-26]. DOI: 10.1038/ s41598-022-12682-8.
[98] KOZIARSKI M, WONIAK M. CCR: a combined cleaning and resampling algorithm for imbalanced data classification[J]. International Journal of Applied Mathematics and Computer Science, 2017, 27(4): 727-736.
[99] 鲍大忙. 迁移学习框架下不平衡分类问题研究[D]. 马鞍山: 安徽工业大学, 2017.
BAO D M. Research on imbalanced classification problems in the framework of transfer learning[D]. Ma’anshan:Anhui University of Technology, 2017.
[100] LI H, LIU H, HU Y. Prediction of unbalanced financial risk based on GRA-TOPSIS and SMOTE-CNN[J]. Scientific Programming, 2022. DOI: 10.1155/2022/8074516.
[101] 琚春华, 陈冠宇, 鲍福光. 基于kNN-Smote-LSTM的消费金融风险检测模型——以信用卡欺诈检测为例[J]. 系统科学与数学, 2021, 41(2): 481-498.
JU C H, CHEN G Y, BAO F G. KNN-Smote-LSTM based consumer financial risk detection model: a case credit card fraud detection[J]. Journal of Systems Science and Mathematical Sciences, 2021, 41(2): 481-498.
[102] 冯伟, 龙以君, 全英汇, 等. 基于SMOTE和深度迁移卷积神经网络的多类不平衡遥感图像分类算法研究[J/OL]. 系统工程与电子技术 [2023-09-06]. http://kns.cnki.net/kcms/detail/11.2422.TN.20221229.1630.008.html.
FENG W, LONG Y J, QUAN Y H, et al. Multi-class imbalance remote sensing image classification based on SMOTE and deep transfer convolutional neural network[J/OL]. Systems Engineering and Electronics [2023-09-06]. http://kns.cnki.net/kcms/detail/11.2422.TN.20221229.1630.008.html.
[103] CHOUHARY R, SHUKLA S. SMOTE based weighted kernel extreme learning machine for imbalanced classification problems[C]//Proceedings of the 2020 International Conference on Internet of Things and Connected Technologies. Cham: Springer, 2020: 193-200.
[104] HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B (Cybernetics), 2011, 42(2): 513-529.
[105] KOZIARSKI M K, WOZNIAK M. Radial-based undersampling for imbalanced data classification[J]. Neuro-computing, 2019, 343: 19-33. DOI: 10.1016/j.neucom.2018.04.089.
[106] GONG Z, CHEN H. Model-based oversampling for imbalanced sequence classification[C]//Proceedings of the 25th ACM International Conference on Information and Knowledge Management, Indianapolis, Oct 24-28, 2016. New York: ACM, 2016: 1009-1018.
[107] KARUNASINGHA N, JAYASEKARA B G, HEVAPATHIGE A. OC-SMOTE-NN: a deep learning-based approach for imbalanced classification[C]//Proceedings of the 13th IEEE Annual Computing and Communication Workshop and Conference. Piscataway: IEEE, 2023: 943-948.
[108] PAN C, PENG K, CHEN T, et al. Power-law-based synthetic minority oversampling technique on imbalanced serum surface-enhanced raman spectroscopy data for cancer screening[J]. Advanced Intelligent Systems, 2023, 5(7): 2300006.
[109] NAKAMURA M, KAJIWARA Y, OTSUKA, et al. LVQ-SMOTE-learning vector quantization based synthetic minority over-sampling technique for biomedical data[J]. BioData Mining, 2013, 6(1): 1-10.
[110] 张成刚, 宋佳智, 姜静清, 等. 一种改进的降噪自编码神经网络不平衡数据分类算法[J]. 计算机应用研究, 2017, 34(5): 1329-1332.
ZHANG C G, SONG J Z, JIANG J Q, et al. Imbalanced data classification algorithm of improved denoising auto-encoder neural network[J]. Application Research of Computers, 2017, 34(5): 1329-1332.
[111] DABLAIN D, KRAWCZYK B, CHAWLA N V. DeepSMOTE: fusing deep learning and SMOTE for imbalanced data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6390-6404.
[112] LIU D, ZHONG S, LIN L, et al. Deep attention SMOTE: data augmentation with a learnable interpolation factor for imbalanced anomaly detection of gas turbines[J]. Computers in Industry, 2023, 151: 103972.
[113] PEREZ-ORTIZ M, GUTIERREZ P A, HERVAS-M C, et al. Graph-based approaches for over-sampling in the context of ordinal regression[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1233-1245.
[114] ZHU T F, LIN Y P, LIU Y H, et al. Minority oversampling for imbalanced ordinal regression[J]. Nowledge-Based Systems, 2019, 166: 140-155.
[115] NGUYEN T, MENGERSEN K, SOUS D, et al. SMOTE-CD: SMOTE for compositional data[J]. PLoS One, 2023, 18(6): e0287705.
[116] WEI Y, WANG Z, WAN H, et al. Promoting inclusive water governance and forecasting the structure of water consumption based on compositional data: a case study of Beijing[J]. Science of the Total Environment, 2018, 634: 407-416.
[117] YOUNIS R, FISICHELLA M. FLY-SMOTE: re-balancing the non-IID IoT edge devices data in federated learning system[J]. IEEE Access, 2022, 10: 65092-65102.
[118] ZHU H, XU J, LIU S, et al. Federated learning on non-IID data: a survey[J]. Neurocomputing, 2021, 465: 371-390.
[119] NAPIERALA K, STEFANOWSKI J, WILK S. Learning from imbalanced data in presence of noisy and borderline examples[C]//Proceedings of the 7th International Conference on Rough Sets and Current Trends in Computing, Warsaw, Jun 28-30, 2010. Berlin, Heidelberg: Springer, 2010: 158-167.
[120] BARANDELA R, SáNCHEZ J S, GARCíA V, et al. Strategies for learning in class imbalance problems[J]. Pattern Recognition, 2003, 36(3): 849-851.
[121] BHARDWAJ M, BHATNAGAR V, SHARMA K. Cost- effectiveness of classification ensembles[J]. Pattern Recognition, 2016, 57: 84-96.
[122] FERNáNDEZ-BALDERA A, BUENAPOSADA J M, BAUMELA L, et al. BAdaCost: multi-class boosting with costs[J]. Pattern Recognition, 2018, 79: 467-479.
[123] LU H, YANG L, YAN K, et al. A cost-sensitive rotation forest algorithm for gene expression data classification[J]. Neurocomputing, 2017, 228: 270-276.
[124] LIM P, GOH C K, TAN K C. Evolutionary cluster-based synthetic oversampling ensemble (ECO-ensemble) for imbalance learning[J]. IEEE Transactions on Cybernetics, 2016, 47(9): 2850-2861.
[125] HARVEY D Y, TODD M D. Automated feature design for numeric sequence classification by genetic programming[J]. IEEE Transactions on Evolutionary Computation, 2014, 19(4): 474-489.
[126] CHEN C, WU X, ZUO E, et al. R-GDORUS technology: effectively solving the Raman spectral data imbalance in medical diagnosis[J]. Chemometrics and Intelligent Laboratory Systems, 2023, 235: 104762.
[127] LIN C T, HSIEH T Y, LIU Y T, et al. Minority oversampling in kernel adaptive subspaces for class imbalanced datasets[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 30(5): 950-962.
[128] BATUWITA R, PALADE V. FSVM-CIL: fuzzy support vector machines for class imbalance learning[J]. IEEE Transactions on Fuzzy Systems, 2010, 18(3): 558-571.
[129] MOSTAFAEI S, AHMADI A, SHAHRABI J. USWAVG-BS: under-sampled weighted averaged borderline SMOTE to handle data intrinsic difficulties[J]. Expert Systems with Applications, 2023, 227: 120379.
[130] STEFANOWSKI J. Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data[M]//Emerging Paradigms in Machine Learning. Berlin, Heidelberg: Springer, 2013: 277-306.
[131] ALCAL-FDEZ J, FERNNDEZ A, LUENGO J, et al. Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework[J]. Journal of Multiple-Valued Logic and Soft Computing, 2011, 17(2/3): 255-287.
[132] FRANK A. UCI machine learning repository[EB/OL]. [2023-06-14]. http://archive.ics.uci.edu/ml.
[133] 李敏波, 董伟伟. 面向不平衡数据集的汽车零部件质量预测[J]. 中国机械工程, 2022, 33(1): 88-96.
LI M B, DONG W W. Quality prediction of automotive parts for imbalanced datasets[J]. China Mechanical Engineering, 2022, 33(1): 88-96.
[134] JAHMUNAH V, NG E Y K, SAN T R, et al. Automated detection of coronary artery disease, myocardial infarction and congestive heart failure using GaborCNN model with ECG signals[J]. Computers in Biology and Medicine, 2021, 134: 104457.
[135] LIU J, CHEN X X, FANG L, et al. Mortality prediction based on imbalanced high-dimensional ICU big data[J]. Computers in Industry, 2018, 98: 218-225.
[136] LIANG J, YE G, GUO J, et al. Reducing false-positives in lung nodules detection using balanced datasets[J]. Frontiers in Public Health, 2021, 9: 671070.
[137] 刘颖, 杨轲. 基于深度集成学习的类极度不均衡数据信用欺诈检测算法[J]. 计算机研究与发展, 2021, 58(3): 539-547.
LIU Y, YANG K. Credit fraud detection for extremely imbalanced data based on ensembled deep learning[J]. Journal of Computer Research and Development, 2021, 58(3): 539-547.
[138] 王鲁, 郑家皓, 陈远高, 等. 面向不平衡数据集多阶段集成模型的信用风险评估方法: CN202210795515.X[P]. 2022-09-30.
WANG L, ZHENG J H, CHEN Y G, et al. A credit risk assessment approach for multi-stage integrated modelling of unbalanced datasets: CN202210795515.X[P]. 2022-09-30.
[139] 周文泳, 冯丽霞, 段春艳. 基于不平衡数据的公司破产预测研究[J]. 同济大学学报(自然科学版), 2022(2): 283-290.
ZHOU W Y, FENG L X, DUAN C Y. Research on company bankruptcy prediction based on unbalanced data[J]. Journal of Tongji University (Natural Science), 2022(2): 283-290.
[140] 吴志峰, 黄若尘, 魏昕, 等. 非均衡IPTV数据集下的用户报障预测[J]. 数据采集与处理, 2018, 33(1): 75-84.
WU Z F, HUANG R C, WEI X, et al. Prediction for user’s complaint in imbalanced IPTV dataset[J]. Journal of Data Acquisition and Processing, 2018, 33(1): 75-84.
[141] 庄文兵, 熊小伏, 李勇杰, 等. 一种基于天气雷达数据的强对流风力等级预测方法: CN201710978296.8[P]. 2017-12-29.
ZHUANG W B, XIONG X F, LI Y J, et al. A prediction method of strong convective wind level based on weather radar data: CN201710978296.8[P]. 2017-12-29.
[142] 廖一星, 潘雪增. 面向不平衡文本的特征选择方法[J]. 电子科技大学学报, 2012, 41(4): 592-595.
LIAO Y X, PAN X Z. Feature selection method on imbalanced text[J]. Journal of University of Electronic Science and Technology of China, 2012, 41(4): 592-595.
[143] 张玉玲, 尹传环. 基于特征频率的安卓恶意软件异常检测的研究[J]. 智能系统学报, 2018, 13(2): 168-173.
ZHANG Y L, YIN C H. Android malware outlier detection based on feature frequency[J]. CAAI Transactions on Intelligent Systems, 2018, 13(2): 168-173.
[144] GHORBANI M, KAZI A, BAGHSHAH M S, et al. RA-GCN: graph convolutional network for disease prediction problems with imbalanced data[J]. Medical Image Analysis, 2022, 75: 102272.
[145] BATISTA G E A P A, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 20-29.
[146] HE H, MA Y. Assessment metrics for imbalanced learning[M]. Wiley-IEEE Press, 2013.
[147] YASEN J, PU F D. Performance measures in evaluating machine learning based bioinformatics predictors for classifications[J]. Quantitative Biology, 2016, 4: 320-330.
[148] POWERS D M W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation[J]. arXiv:2010.16061, 2020.
[149] AL HELAL M, HAYDAR M S, MOSTAFA S A M. Algorithms efficiency measurement on imbalanced data using geometric mean and cross validation[C]//Proceedings of the 2016 International Workshop on Computational Intelligence. Piscataway: IEEE, 2016: 110-114.