计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (6): 1374-1382.DOI: 10.3778/j.issn.1673-9418.2012100
收稿日期:
2020-12-10
修回日期:
2021-02-05
出版日期:
2022-06-01
发布日期:
2021-03-03
通讯作者:
+ E-mail: 6191611052@stu.jiangnan.edu.cn作者简介:
张壮(1998—),男,湖北仙桃人,硕士研究生,主要研究方向为人工智能、模式识别。基金资助:
Received:
2020-12-10
Revised:
2021-02-05
Online:
2022-06-01
Published:
2021-03-03
About author:
ZHANG Zhuang, born in 1998, M.S. candidate. His research interests include artificial intelligence and pattern recognition.Supported by:
摘要:
集成学习是非线性系统的主流建模方法之一。但当常规的集成TSK模糊模型直接用于不平衡数据集时,其学习性能容易受到数据不平衡性的影响,因而常常会导致泛化能力差。为解决这一问题,基于TSK模糊模型提出了一种对不平衡数据处理的分类集成模型。基本思想是:首先利用SMOTE过采样方法对不平衡样本集做预处理,使得类别分布相对平衡,再引入AdaBoost方法对集成TSK模糊模型进行学习,集成时根据权值大小对样本进行随机采样,并通过多次训练对权值进行迭代更新,最后将生成的各个模型结果根据特定的加权方法结合,产生最终输出,使各模型得到充分的训练,进而提升整个集成TSK模糊模型的泛化能力。由此,提出了对应的不平衡数据的集成TSK模糊模型,并使用模型在多个数据集上进行实验,采用均方误差和精度对模型进行评估均有较好的效果,然后改变模型数量和规则数量等参数探究它们对模型性能的影响,并使用图像表示它们的变化情况,实验结果证明了所提出的集成学习算法的有效性。
中图分类号:
张壮, 王士同. 不平衡数据的Takagi-Sugeno-Kang模糊分类集成模型[J]. 计算机科学与探索, 2022, 16(6): 1374-1382.
ZHANG Zhuang, WANG Shitong. Ensemble Model of Takagi-Sugeno-Kang Fuzzy Classifiers for Imbalanced Data[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1374-1382.
名称 | 样本数 | 属性数 |
---|---|---|
appendicitis | 106 | 7 |
banana | 5 300 | 2 |
banknote | 1 372 | 5 |
ionosphere | 351 | 33 |
phoneme | 5 404 | 5 |
stability | 10 000 | 14 |
eye | 14 980 | 15 |
magic | 19 020 | 11 |
表1 数据集概要
Table 1 Summary of datasets
名称 | 样本数 | 属性数 |
---|---|---|
appendicitis | 106 | 7 |
banana | 5 300 | 2 |
banknote | 1 372 | 5 |
ionosphere | 351 | 33 |
phoneme | 5 404 | 5 |
stability | 10 000 | 14 |
eye | 14 980 | 15 |
magic | 19 020 | 11 |
数据集 | TSK | ETSK | ETSK-ID |
---|---|---|---|
appendicitis | 1.048 5±0.616 0 | 1.025 6±0.344 3 | 0.999 5±0.649 7 |
banana | 0.367 6±0.033 0 | 0.340 5±0.020 3 | 0.334 2±0.021 4 |
banknote | 0.013 0±0.002 9 | 0.010 4±0.001 7 | 0.010 2±0.001 9 |
ionosphere | 0.438 6±0.194 8 | 0.382 9±0.168 9 | 0.359 9±0.084 5 |
phoneme | 0.143 2±0.005 3 | 0.127 6±0.005 0 | 0.119 7±0.008 0 |
stability | 0.066 9±0.001 7 | 0.065 9±0.001 6 | 0.063 4±0.002 3 |
eye | 0.204 5±0.047 0 | 0.191 5±0.024 7 | 0.185 2±0.005 2 |
magic | 0.154 6±0.002 1 | 0.148 7±0.017 1 | 0.135 1±0.003 3 |
表2 各种一阶模型的均方误差 ( c = 9 , T = 6 )
Table 2 MSE obtained for various first-order models ( c = 9 , T = 6 )
数据集 | TSK | ETSK | ETSK-ID |
---|---|---|---|
appendicitis | 1.048 5±0.616 0 | 1.025 6±0.344 3 | 0.999 5±0.649 7 |
banana | 0.367 6±0.033 0 | 0.340 5±0.020 3 | 0.334 2±0.021 4 |
banknote | 0.013 0±0.002 9 | 0.010 4±0.001 7 | 0.010 2±0.001 9 |
ionosphere | 0.438 6±0.194 8 | 0.382 9±0.168 9 | 0.359 9±0.084 5 |
phoneme | 0.143 2±0.005 3 | 0.127 6±0.005 0 | 0.119 7±0.008 0 |
stability | 0.066 9±0.001 7 | 0.065 9±0.001 6 | 0.063 4±0.002 3 |
eye | 0.204 5±0.047 0 | 0.191 5±0.024 7 | 0.185 2±0.005 2 |
magic | 0.154 6±0.002 1 | 0.148 7±0.017 1 | 0.135 1±0.003 3 |
数据集 | TSK | ETSK | ETSK-ID |
---|---|---|---|
appendicitis | 0.863 4±0.401 0 | 0.567 8±0.426 1 | 0.240 6±0.166 7 |
banana | 0.295 9±0.025 7 | 0.290 6±0.023 2 | 0.289 7±0.028 0 |
banknote | 0.002 5±0.000 8 | 0.002 4±0.000 5 | 0.002 1±0.000 5 |
ionosphere | 0.380 5±0.153 4 | 0.278 7±0.262 8 | 0.242 3±0.157 4 |
phoneme | 0.122 0±0.006 7 | 0.104 4±0.007 0 | 0.102 8±0.007 8 |
stability | 0.055 3±0.001 5 | 0.055 0±0.001 5 | 0.054 4±0.001 6 |
eye | 0.118 5±0.021 3 | 0.107 3±0.025 3 | 0.086 8±0.007 1 |
magic | 0.124 3±0.015 5 | 0.121 6±0.005 2 | 0.114 8±0.004 4 |
表3 各种二阶模型的均方误差 ( c = 9 , T = 6 )
Table 3 MSE obtained for various second-order models ( c = 9 , T = 6 )
数据集 | TSK | ETSK | ETSK-ID |
---|---|---|---|
appendicitis | 0.863 4±0.401 0 | 0.567 8±0.426 1 | 0.240 6±0.166 7 |
banana | 0.295 9±0.025 7 | 0.290 6±0.023 2 | 0.289 7±0.028 0 |
banknote | 0.002 5±0.000 8 | 0.002 4±0.000 5 | 0.002 1±0.000 5 |
ionosphere | 0.380 5±0.153 4 | 0.278 7±0.262 8 | 0.242 3±0.157 4 |
phoneme | 0.122 0±0.006 7 | 0.104 4±0.007 0 | 0.102 8±0.007 8 |
stability | 0.055 3±0.001 5 | 0.055 0±0.001 5 | 0.054 4±0.001 6 |
eye | 0.118 5±0.021 3 | 0.107 3±0.025 3 | 0.086 8±0.007 1 |
magic | 0.124 3±0.015 5 | 0.121 6±0.005 2 | 0.114 8±0.004 4 |
数据集 | TSK | ETSK | ETSK-ID |
---|---|---|---|
appendicitis | 72.47±6.84 | 80.00±8.13 | 83.52±4.80 |
banana | 90.18±1.14 | 90.35±1.21 | 90.63±1.38 |
banknote | 94.51±6.71 | 95.07±6.29 | 96.40±6.00 |
ionosphere | 84.89±6.97 | 87.27±4.18 | 88.66±5.59 |
phoneme | 84.17±1.31 | 86.46±1.46 | 86.92±1.56 |
stability | 96.93±0.40 | 96.98±0.51 | 97.83±0.31 |
eye | 90.84±0.72 | 91.49±0.84 | 92.42±0.78 |
magic | 83.05±0.72 | 85.08±0.90 | 85.39±0.48 |
表4 各种二阶模型的精度 ( c = 9 , T = 6 )
Table 4 Accuracy obtained for various second-order models ( c = 9 , T = 6 ) %
数据集 | TSK | ETSK | ETSK-ID |
---|---|---|---|
appendicitis | 72.47±6.84 | 80.00±8.13 | 83.52±4.80 |
banana | 90.18±1.14 | 90.35±1.21 | 90.63±1.38 |
banknote | 94.51±6.71 | 95.07±6.29 | 96.40±6.00 |
ionosphere | 84.89±6.97 | 87.27±4.18 | 88.66±5.59 |
phoneme | 84.17±1.31 | 86.46±1.46 | 86.92±1.56 |
stability | 96.93±0.40 | 96.98±0.51 | 97.83±0.31 |
eye | 90.84±0.72 | 91.49±0.84 | 92.42±0.78 |
magic | 83.05±0.72 | 85.08±0.90 | 85.39±0.48 |
[1] | ZADEH L A. Fuzzy sets[J]. Information & Control, 1965, 8(3): 338-353. |
[2] |
MAMDANI E H. Application of fuzzy algorithms for control of simple dynamic plant[J]. Proceedings of the Institution of Electrical Engineers, 1974, 121(12): 1585-1588.
DOI URL |
[3] | TAKAGI T, SUGENO M. Fuzzy identification of systems and its applications to modeling and control[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1985, 15(1): 116-132. |
[4] |
SUGENO M, KANG G T. Structure identification of fuzzy model[J]. Fuzzy Sets and Systems, 1988, 28(1): 15-33.
DOI URL |
[5] | 杜轻, 辛守庭, 雷新宇, 等. 基于脑网络和TSK模糊系统的癫痫脑电识别[J]. 计算机工程与应用, 2020, 56(2): 133-140. |
DU Q, XIN S T, LEI X Y, et al. Seizures identification from EEG signals based on functional brain network and TSK fuzzy system[J]. Computer Engineering and Applications, 2020, 56(2): 133-140. | |
[6] | 张春香, 王骏, 张嘉旭, 等. 面向自闭症辅助诊断的联合组稀疏TSK建模方法[J]. 计算机科学与探索, 2020, 14(12): 2083-2093. |
ZHANG C X, WANG J, ZHANG J X, et al. Novel TSK modeling method with joint group sparse learning for autism aided diagnosis[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(12): 2083-2093. | |
[7] | SCHAPIRE R E. The strength of weak learnability[J]. Machine Learning, 1990, 5(2): 197-227. |
[8] |
LITTLESTONE N, WARMUTH M K. The weighted majority algorithm[J]. Information and Computation, 1994, 108(2): 212-261.
DOI URL |
[9] | BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140. |
[10] | FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to Boosting[J]. Journal of Computer & System Sciences, 1997, 55(1): 119-139. |
[11] | FRIEDMAN J, HASTIE T, TIBSHIRANI R. Additive logistic regression: a statistical view of boosting[J]. The Annals of Statistics, 2000, 28(2): 337-407. |
[12] | DRUCKER H. Improving regressors using boosting techniques[C]// Proceedings of the 14th International Conference on Machine Learning, Nashville, Jul 8-12, 1997. San Mateo: Morgan Kaufmann, 1997: 107-115. |
[13] |
SHRESTHA D L, SOLOMATINE D P. Experiments with AdaBoost.RT, an improved Boosting scheme for regression[J]. Neural Computation, 2006, 18(7): 1678-1710.
DOI URL |
[14] |
CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
DOI URL |
[15] |
SALGADO C M, VIEGAS J L, AZEVEDO C S, et al. Takagi-Sugeno fuzzy modeling using mixed fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6): 1417-1429.
DOI URL |
[16] |
HU X C, PEDRYCZ W, WANG X M. Granular fuzzy rule-based models: a study in a comprehensive evaluation of fuzzy models[J]. IEEE Transactions on Fuzzy Systems, 2016, 25(5): 1342-1355.
DOI URL |
[17] |
REZAEE B, ZARANDI M H F. Data-driven fuzzy modeling for Takagi-Sugeno-Kang fuzzy system[J]. Information Sciences, 2010, 180(2): 241-255.
DOI URL |
[18] |
HU X C, PEDRYCZ W, WANG X M. Random ensemble of fuzzy rule-based models[J]. Knowledge-Based Systems, 2019, 181: 104768.
DOI URL |
[19] | 曹雅, 邓赵红, 王士同. 单调约束的TSK模糊系统模型[J]. 计算机科学与探索, 2018, 12(9): 1487-1495. |
CAO Y, DENG Z H, WANG S T. TSK fuzzy system model with monotonic constraints[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(9): 1487-1495. | |
[20] | 陈俊勇, 邓赵红, 王士同. 区间二型模糊子空间0阶TSK系统[J]. 计算机科学与探索, 2017, 11(10): 1652-1661. |
CHEN J Y, DENG Z H, WANG S T. Interval type-2 fuzzy subspace zero-order TSK system[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(10): 1652-1661. | |
[21] |
BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy C-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2/3): 191-203.
DOI URL |
[22] | 张雄涛, 蒋云良, 潘兴广, 等. 基于迭代模糊聚类算法与K近邻和数据字典的集成TSK模糊分类器[J]. 电子与信息学报, 2020, 42(3): 211-219. |
ZHANG X T, JIANG Y L, PAN X G, et al. Iterative fuzzy C-means clustering algorithm & K-nearest neighbor and dictionary data based ensemble TSK fuzzy classifiers[J]. Journal of Electronics & Information Technology, 2020, 42(3): 211-219. | |
[23] | REN Y, ZHANG L, SUGANTHAN P N. Ensemble classification and regression-recent developments, applications and future directions[J]. IEEE Computational Intelligence Magazine, 2016, 11(1): 41-53. |
[24] | 张雄涛. 模糊TSK系统的深度集成研究[D]. 无锡: 江南大学, 2019. |
ZHANG X T. Research of ensemble deep TSK fuzzy system[D]. Wuxi: Jiangnan University, 2019. |
[1] | 陈洋, 王士同. 多样性正则化极限学习机的集成方法[J]. 计算机科学与探索, 2022, 16(8): 1819-1928. |
[2] | 申瑞彩, 翟俊海, 侯璎真. 选择性集成学习多判别器生成对抗网络[J]. 计算机科学与探索, 2022, 16(6): 1429-1438. |
[3] | 黄宇翔, 黄栋, 王昌栋, 赖剑煌. 基于集成学习的改进深度嵌入聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1949-1957. |
[4] | 孙伟, 张羽. 利用流挖掘和图挖掘的内网异常检测方法[J]. 计算机科学与探索, 2020, 14(7): 1154-1163. |
[5] | 严远亭,朱原玮,吴增宝,张以文,张燕平. 构造性覆盖算法的SMOTE过采样方法[J]. 计算机科学与探索, 2020, 14(6): 975-984. |
[6] | 陈兴国,徐修颖,陈康扬,杨光. 基于CMAES集成学习方法的地表水质分类[J]. 计算机科学与探索, 2020, 14(3): 426-436. |
[7] | 杨浩,陈红梅. 结合样本局部密度的非平衡数据集成分类算法[J]. 计算机科学与探索, 2020, 14(2): 274-284. |
[8] | 商显震,韩萌,孙毓忠,孙宇宁,陈旭,胡满满,梅御东. 融合生成对抗网络和朴素贝叶斯皮肤病诊断方法[J]. 计算机科学与探索, 2019, 13(6): 1005-1015. |
[9] | 吴艺凡,梁吉业,王俊红. 基于混合采样的非平衡数据分类算法[J]. 计算机科学与探索, 2019, 13(2): 342-349. |
[10] | 丁毅,王明亮,张道强. 差异性随机子空间集成[J]. 计算机科学与探索, 2018, 12(9): 1434-1443. |
[11] | 么素素,王宝亮,侯永宏. 绝对不平衡样本分类的集成迁移学习算法[J]. 计算机科学与探索, 2018, 12(7): 1145-1153. |
[12] | 许欧阳,李光辉. 萤火虫优化和随机森林的WSN异常数据检测[J]. 计算机科学与探索, 2018, 12(10): 1633-1644. |
[13] | 吴伟昆,傅仰耿,苏群,吴英杰,巩晓婷. 基于GDA的置信规则库参数训练的集成学习方法[J]. 计算机科学与探索, 2016, 10(12): 1651-1661. |
[14] | 李全武,李玉惠,李勃,陈伊. 车脸定位及识别方法研究[J]. 计算机科学与探索, 2015, 9(6): 726-733. |
[15] | 熊俊,王士同,潘永惠,包芳. 基于中心型TSK模糊模型的分层模糊系统[J]. 计算机科学与探索, 2015, 9(2): 249-256. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||