Journal of Frontiers of Computer Science and Technology ›› 2007, Vol. 1 ›› Issue (1): 108-115.

• 学术研究 • Previous Articles     Next Articles

Identifying exceptions from data streams based on kernel estimation and interval clustering

ZHANG Shi-chao,YOU Xiao-fang,YUAN Ding-rong

  

  1. Faculty of Computer Science and Technology,Guangxi Normal University,Guilin,Guangxi 541004,China

  • Received:1900-01-01 Revised:1900-01-01 Online:2007-06-06 Published:2007-06-06
  • Contact: ZHANG Shi-chao

基于核估计和区间聚类的数据流中异常模式发现*

张师超,尤晓芳,袁鼎荣   

  1. 广西师范大学 计算机科学与信息工程学院,广西 桂林 541004

  • 通讯作者: 张师超

Abstract:

It proposes a strategy for mining abnormal burst patterns from data streams using a sliding window with bounded resources(such as memory constraints).Design a compact data structure,TTI,which consists of
three nested tiers of time intervals,for monitoring data entrance into the sliding window so as to identify the current abnormity at any time.The threshold for identifying abnormal items in the approach is dynamically generated by an algorithm KIC(kernel estimation and confidence interval clustering),whereas existing algo
rithms use predefined(static) thresholds.This leads to more accurate outputs.Based on the threshold,an algorithm SWMA is designed to reduce time and space complexities.The approach is evaluated by conducting experiments on a simulated linear model,a non-linear model and a real time series data stream.It demonstrates that the method is efficient and promising.

Key words: data streams, kernel estimation, confidence interval clustering, exception patterns, sliding windows

摘要:

研究数据流中异常模式发现问题。为保证可以随时输出当前的异常模式,引入一种简单且有效的数据结构——三层时间区间嵌套模式(TTI),来监测数据流。对新到数据是否为异常加以判断评价的标准不是预先分配的静止阈值,而是由算法(KIC:核估计和置信区间聚类分析)计算得到的动态阈值,从而在仅占用很小内存的前提下提高了算法的准确性。设计的SWMA算法进一步降低了时间和空间复杂度。最后分别在模拟线性模型、非线性模型及带时间戳的真实数据流上对方法的准确性、可行性和时效性进行了验证。

关键词: 数据流, 核估计, 置信区间聚类分析, 异常模式, 滑动窗口