计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (7): 1200-1210.DOI: 10.3778/j.issn.1673-9418.1907045

• 人工智能 • 上一篇    下一篇

自适应概念漂移问题的增量集成分类算法

韩明明,孙广路,朱素霞   

  1. 1. 哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080
    2. 哈尔滨理工大学 信息安全与智能技术研究中心,哈尔滨 150080
  • 出版日期:2020-07-01 发布日期:2020-08-12

Adaptive Incremental-Learning Ensemble Classification Approach for Concept Drift Problem

HAN Mingming, SUN Guanglu, ZHU Suxia   

  1. 1. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
    2. Research Center of Information Security and Intelligent Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Online:2020-07-01 Published:2020-08-12

摘要:

由于数据流具有非平稳特性,即概念漂移问题,导致机器学习模型的性能随着概念漂移的发生而降低。对分类器如何自适应概念漂移进行了研究,提出了以小数据块为输入的增量学习的增强集成算法,用于处理概念漂移情况下的数据流分类问题。该算法没有复杂的参数,但对弱分类器提出较高的要求,每次移除不合格的弱分类器后添加新的弱分类器,在迭代增量训练过程中根据训练误差更新样本和弱分类器的权重,最后通过加权投票方式整合各弱分类器的预测结果。用五组已知具体漂移情况的人工数据和三组未知漂移情况的真实数据进行实验,并与已有的算法进行对比,实验结果表明该算法能很好地处理概念漂移下的数据流分类问题。

关键词: 数据流分类问题, 概念漂移, 集成算法

Abstract:

The performance of the machine learning model always decreases with the occurrence of concept drift due to the non-stationary characteristics of the data flow. This paper studies how the classifier adapts to concept drift, and proposes an incremental learning ensemble algorithm with small data blocks as input to deal with data stream classification under concept drift. This algorithm does not have complex parameters, but it puts forward higher requirements for weak classifiers. After removing unqualified weak classifiers, new weak classifiers are added. During the incremental training, weights of samples and weak classifiers are updated according to training error. Finally, the prediction results of each weak classifier are integrated by weighted voting. In this paper, five artificial datasets with specific drift and three real datasets with unknown drift are used for experiments, and compared with four existing algorithms, the experimental results show that the algorithm can deal with the data flow classification problem under concept drift.

Key words: data stream classification problem, concept drift, ensemble algorithm