计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (1): 36-42.DOI: 10.3778/j.issn.1673-9418.1504009

• 数据库技术 • 上一篇    下一篇

融合互近邻降噪的动态数据流分类研究

刘三民1+,王忠群2,刘  涛1,修  宇1   

  1. 1. 安徽工程大学 计算机与信息学院,安徽 芜湖 241000
    2. 安徽工程大学 管理工程学院,安徽 芜湖 241000
  • 出版日期:2016-01-01 发布日期:2016-01-07

Research on Dynamic Data Streams Classification with Noise Elimination Using Mutual Nearest Neighbor

LIU Sanmin1+, WANG Zhongqun2, LIU Tao1, XIU Yu1   

  1. 1. College of Computer and Information, Anhui Polytechnic University, Wuhu, Anhui 241000, China
    2. College of Management Engineering, Anhui Polytechnic University, Wuhu, Anhui 241000, China
  • Online:2016-01-01 Published:2016-01-07

摘要: 动态数据流分类挖掘应用场景逐渐增多,如何辨识出动态数据流中概念漂移和噪声信息成为数据流分类研究中的重点。因此提出一种具备噪声检测能力的动态数据流增量式分类挖掘模型解决此类问题。当动态数据流中出现样本信息与分类模型概念不相容时,采用互近邻思想检测样本是否为噪声,在此基础上用支持向量机作为学习器,通过增量式学习解决数据流中概念漂移。在两种不相容度量标准下,结合理论分析和实验,证明了所提的分类方案是有效可行的。

关键词: 互近邻, 增量学习, 数据流分类, 不确定性, 概念漂移

Abstract: Application scenarios of dynamic data streams classification are increasing, and it is very important to discriminate concept drift from noisy information in data streams classification. This paper proposes an incremental classification model with noisy elimination for dynamic data streams classification to solve this problem. When dynamic data streams sample is incompatible with the concept of classifier model, the mutual nearest neighbor is used to detect noisy sample. Based on it, support vector machine is used as learner, and then concept drift in data streams is solved by incremental learning. Under two different metrics about incompatibility, classification schema is effective through the theory analysis and simulation experiment.

Key words: mutual nearest neighbor, incremental learning, data streams classification, uncertainty, concept drift