计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (6): 545-550.DOI: 10.3778/j.issn.1673-9418.1210006

• 学术研究 • 上一篇    下一篇

倾斜数据流中正例样本的漂移检测方法

张玉红+,胡学钢,张  娟   

  1. 合肥工业大学 计算机与信息学院,合肥 230009
  • 出版日期:2013-06-01 发布日期:2013-05-30

Concept Drift Detection of Positive Class in Skewed Data Streams

ZHANG Yuhong+, HU Xuegang, ZHANG Juan   

  1. School of Computer and Information, Hefei University of Technology, Hefei 230009, China
  • Online:2013-06-01 Published:2013-05-30

摘要: 倾斜数据中普遍存在概念漂移,而已有数据流概念漂移检测方法多假设类分布是平衡的,难以用于倾斜数据流。为此,提出了一种基于正例分布的倾斜数据流概念漂移检测方法CDPSD。首先采用改进的重采样方法,避免将不同概念的实例采样到同一数据块中,并构建分类器;再通过检测正例而非所有实例的类分布变化进行概念漂移的检测及分类器更新。实验表明,CDPSD能及时检测到倾斜数据流中的概念漂移,并快速更新分类模型,提高了正类样本的分类效果。

关键词: 概念漂移, 倾斜数据流, 重采样, 分类, 正类

Abstract: The concept drift is common in skewed data stream (SDS). However, the most detection algorithms of concept drift assume that the class distributions of data streams are balanced, and are not suitable in skewed data streams. Therefore, this paper proposes a detection approach for concept drifts in SDS, called CDPSD. Firstly, it adopts the modified resample method, which makes the instances in different concepts belong to different data blocks, and then builds the classifiers. Secondly, it uses the class distribution of the positive not all instances to detect the concept drifts and modify the classifiers. The experiments show that CDPSD can detect the concept drift, update the classifier in time, and promote the classification results of positive instances.

Key words: concept drift, skewed data streams, resample, classification, positive class