在数据流管理系统中实现快速决策树算法*

doi:10.3778/j.issn.1673-9418.2010.08.001

计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (8): 673-682.DOI: 10.3778/j.issn.1673-9418.2010.08.001

在数据流管理系统中实现快速决策树算法*

袁磊¹, 张阳²⁺, 李梅¹, 李雪³, 王勇⁴

1. 西北农林科技大学机械与电子工程学院, 陕西杨凌 712100
2. 西北农林科技大学信息工程学院, 陕西杨凌 712100
3. 昆士兰大学信息技术与电子工程系, 布里斯班 4072, 澳大利亚
4. 西北工业大学计算机学院, 西安 710072

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-08-10 发布日期:2010-08-10
通讯作者: 张阳

Programming the VFDT Algorithm in Data Stream Manage-ment System*

YUAN Lei¹, ZHANG Yang²⁺, LI Mei¹, LI Xue³, WANG Yong⁴

1. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
2. College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
3. School of Information Technology and Electrical Engineering, University of Queensland, Brisbane 4072, Australia
4. School of Computer, Northwestern Polytechnical University, Xi’an 710072, China

Received:1900-01-01 Revised:1900-01-01 Online:2010-08-10 Published:2010-08-10
Contact: ZHANG Yang

摘要/Abstract

摘要： 在数据流管理系统(data stream management system, DSMS)中嵌入数据挖掘算法对数据库研究者是一项新的挑战, 而在数据流管理系统中嵌入快速决策树(very fast decision tree, VFDT), 尚未见报道。利用DSMS原有的机制在Esper中实现了VFDT算法。其主要思想是将VFDT算法转换为Esper的数据查询语言(Esper query language, EQL)。给出了在DSMS中实现VFDT算法的两种方法：普通方法。直接将VFDT算法转化为EQL语言并在DSMS中实现(记作DVFDT); 改进方法。通过Esper中固有的批量处理模式来实现(记作optimal-DVFDT)。通过一系列实验比较分析了两种方法对海量数据流分类的准确率和性能; 将提出的两种方法与用Java实现的VFDT算法(记作JVFDT)在分类精度和时间上进行比较。结果表明, 在DSMS中实现的VFDT算法具有较好的性能, 并且该算法对大规模数据流数据的子集同样具有较高的性能。

关键词: 数据管理系统, VFDT算法, 嵌入, 分类

Abstract: Integrating data stream mining algorithm with data stream management system (DSMS) is a novel challenge for data mining and database researchers. But the integration of very fast decision tree (VFDT) with data stream management has not been reported till now. This paper focuses on integrating VFDT algorithm with Esper by exploiting capabilities of data stream management system (DSMS). How to transform the algorithm into efficient Esper query language (EQL) is analyzed, and two implementations for integrating the popular VFDT algorithm with DSMS are proposed: Transforming the VFDT algorithm into EQL straightforwardly (denoted by DVFDT); an opti-mized version of DVFDT based on the inherent batch mode of Esper (denoted by optimal-DVFDT). The proposed implementations with VFDT based on Java (denoted by JVFDT) in terms of classification accuracy and performance are compared. Experiments on a set of large volume of synthetic data show the implementation works efficiently and accurately. In addition, this approach also has better performance for the sub-streams of the original data stream.

Key words: data stream management system (DSMS), very fast decision tree (VFDT) algorithm, integration, classification

中图分类号:

TP181

袁磊1 , 张阳2+ , 李梅1 , 李雪3 , 王勇4 . 在数据流管理系统中实现快速决策树算法*[J]. 计算机科学与探索, 2010, 4(8): 673-682.

YUAN Lei¹, ZHANG Yang²⁺, LI Mei¹, LI Xue³, WANG Yong⁴. Programming the VFDT Algorithm in Data Stream Manage-ment System*[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(8): 673-682.

[1]	张梦倩, 张莉. 粗-细两阶段卷积神经网络算法[J]. 计算机科学与探索, 2021, 15(8): 1501-1510.
[2]	陈洁, 陈嘉琳, 赵姝, 张燕平. 层次标签引导的属性网络嵌入[J]. 计算机科学与探索, 2021, 15(7): 1279-1288.
[3]	赵泽渊, 代永强. 改进混合二进制蝗虫优化特征选择算法[J]. 计算机科学与探索, 2021, 15(7): 1339-1349.
[4]	赵学武, 吴宁, 王军, 阮利, 李玲玲, 徐涛. 航空大数据研究综述[J]. 计算机科学与探索, 2021, 15(6): 999-1025.
[5]	刘靖祎, 史彩娟, 涂冬景, 刘帅. 零样本图像分类综述[J]. 计算机科学与探索, 2021, 15(5): 812-824.
[6]	王润正, 高见, 仝鑫, 杨梦岐. 融合注意力机制的恶意代码家族分类研究[J]. 计算机科学与探索, 2021, 15(5): 881-892.
[7]	张毅, 王士同. 在高斯分布下优化仿射变换的极限学习机[J]. 计算机科学与探索, 2021, 15(4): 690-701.
[8]	史志成, 周宇. 代码特征自动提取方法[J]. 计算机科学与探索, 2021, 15(3): 456-467.
[9]	刘晓龙, 王士同. 面向开放集图像分类的模糊域自适应方法[J]. 计算机科学与探索, 2021, 15(3): 515-523.
[10]	祖弦, 谢飞, 刘啸剑. 融合词和文档嵌入的关键词抽取算法[J]. 计算机科学与探索, 2021, 15(2): 294-304.
[11]	任建华, 李静, 孟祥福. 上下文感知与层级注意力网络的文档分类方法[J]. 计算机科学与探索, 2021, 15(2): 305-314.
[12]	杨章静, 王文博, 黄璞, 张凡龙. 基于潜子空间去噪的子空间学习图像分类方法[J]. 计算机科学与探索, 2021, 15(12): 2374-2389.
[13]	舒世泰，李松，郝晓红，张丽平. 知识图谱嵌入技术研究进展[J]. 计算机科学与探索, 2021, 15(11): 2048-2062.
[14]	安平，冀中，刘西瑶. 任务感知双原型网络的人物交互少样本识别[J]. 计算机科学与探索, 2021, 15(11): 2184-2192.
[15]	李祥霞, 吉晓慧, 李彬. 细粒度图像分类的深度学习方法[J]. 计算机科学与探索, 2021, 15(10): 1830-1842.

在数据流管理系统中实现快速决策树算法*

Programming the VFDT Algorithm in Data Stream Manage-ment System*

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics