%0 Journal Article %A XIE Zipeng %A BAO Chongming %A ZHOU Lihua %A WANG Chongyun %A KONG Bing %T EM Clustering Oversampling Algorithm for Class Imbalanced Data %D 2023 %R 10.3778/j.issn.1673-9418.2104080 %J Journal of Frontiers of Computer Science & Technology %P 228-237 %V 17 %N 1 %X Considering the problem of low classification performance caused by imbalanced dataset in the classification task, an EM (expectation-maximization) clustering oversampling algorithm for imbalanced data is proposed, which can solve the problem of imbalanced data fundamentally by increasing the number of samples of a few classes through oversampling. Firstly, the clustering technology is adopted to measure the similarity between samples by Euclidean distance, and the center point of each cluster is selected as the oversampling point, which solves the problem of insufficient importance of samples to some extent. Secondly, the problem that SMOTE, Cluster-SMOTE and other methods have no pertinence in clustering space can be solved by sampling in a few sample spaces directly. At the same time, through over-sampling 30% of the number of samples of a few categories, the problems that undersampling based on Cluster clustering blindly pursues the balance of the number of samples of two categories and SMOTE and other algorithms do not have clear sampling rate are effectively solved. Experi-ments on 24 public datasets with class imbalance are carried out to verify the effectiveness of the proposed method. %U http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2104080