计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (6): 1374-1382.DOI: 10.3778/j.issn.1673-9418.2012100

• 人工智能 • 上一篇    下一篇

不平衡数据的Takagi-Sugeno-Kang模糊分类集成模型

张壮(), 王士同   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 收稿日期:2020-12-10 修回日期:2021-02-05 出版日期:2022-06-01 发布日期:2021-03-03
  • 通讯作者: + E-mail: 6191611052@stu.jiangnan.edu.cn
  • 作者简介:张壮(1998—),男,湖北仙桃人,硕士研究生,主要研究方向为人工智能、模式识别。
    王士同(1964—),男,江苏扬州人,教授,博士生导师,CCF会员,主要研究方向为人工智能、模式识别等。
  • 基金资助:
    江苏省自然科学基金(BK20191331)

Ensemble Model of Takagi-Sugeno-Kang Fuzzy Classifiers for Imbalanced Data

ZHANG Zhuang(), WANG Shitong   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2020-12-10 Revised:2021-02-05 Online:2022-06-01 Published:2021-03-03
  • About author:ZHANG Zhuang, born in 1998, M.S. candidate. His research interests include artificial intelligence and pattern recognition.
    WANG Shitong, born in 1964, professor, Ph.D. supervisor, member of CCF. His research interests include artificial intelligence, pattern recognition, etc.
  • Supported by:
    Natural Science Foundation of Jiangsu Province(BK20191331)

摘要:

集成学习是非线性系统的主流建模方法之一。但当常规的集成TSK模糊模型直接用于不平衡数据集时,其学习性能容易受到数据不平衡性的影响,因而常常会导致泛化能力差。为解决这一问题,基于TSK模糊模型提出了一种对不平衡数据处理的分类集成模型。基本思想是:首先利用SMOTE过采样方法对不平衡样本集做预处理,使得类别分布相对平衡,再引入AdaBoost方法对集成TSK模糊模型进行学习,集成时根据权值大小对样本进行随机采样,并通过多次训练对权值进行迭代更新,最后将生成的各个模型结果根据特定的加权方法结合,产生最终输出,使各模型得到充分的训练,进而提升整个集成TSK模糊模型的泛化能力。由此,提出了对应的不平衡数据的集成TSK模糊模型,并使用模型在多个数据集上进行实验,采用均方误差和精度对模型进行评估均有较好的效果,然后改变模型数量和规则数量等参数探究它们对模型性能的影响,并使用图像表示它们的变化情况,实验结果证明了所提出的集成学习算法的有效性。

关键词: TSK模糊模型, 集成学习, AdaBoost, 不平衡数据, SMOTE

Abstract:

Ensemble learning is one of the most popular methods for nonlinear systems. However, when the traditional ensemble models of TSK fuzzy classifiers are directly applied to imbalanced data, their learning performances will be deteriorated with poor generalization ability. In order to tackle with this issue, a TSK-fuzzy-classifier-based ensemble method for imbalanced data is proposed in this paper. The basic idea is to first pre-process the imbalanced dataset by the SMOTE method to achieve the corresponding balanced dataset, and then take the AdaBoost strategy to ensemble several TSK fuzzy classifiers so that satisfactory performance is got. During the training of the proposed ensemble model, randomly sampled data will be dynamically and iteratively weighted to feed as the inputs into each TSK fuzzy model, and then each trained TSK fuzzy classifier is weighted to form the final outputs of the ensemble model, so that each model can be fully trained, and then the generalization ability of the whole ensemble TSK fuzzy model is improved. Therefore, an ensemble TSK fuzzy model corresponding to imbalanced data is proposed, and the model is used to experiment on multiple datasets. The mean square error (MSE) and accuracy are used to evaluate the model, then the number of models and rules are changed to explore their effect on model performance, and images are used to show their changes. The experimental results demonstrate the effectiveness of the proposed ensemble model.

Key words: TSK fuzzy model, ensemble learning, AdaBoost, imbalanced data, SMOTE

中图分类号: