Programming the VFDT Algorithm in Data Stream Manage-ment System*

doi:10.3778/j.issn.1673-9418.2010.08.001

Journal of Frontiers of Computer Science and Technology ›› 2010, Vol. 4 ›› Issue (8): 673-682.DOI: 10.3778/j.issn.1673-9418.2010.08.001

• 学术研究 • Previous Articles Next Articles

Programming the VFDT Algorithm in Data Stream Manage-ment System*

YUAN Lei¹, ZHANG Yang²⁺, LI Mei¹, LI Xue³, WANG Yong⁴

1. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
2. College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
3. School of Information Technology and Electrical Engineering, University of Queensland, Brisbane 4072, Australia
4. School of Computer, Northwestern Polytechnical University, Xi’an 710072, China

Received:1900-01-01 Revised:1900-01-01 Online:2010-08-10 Published:2010-08-10
Contact: ZHANG Yang

在数据流管理系统中实现快速决策树算法*

袁磊¹, 张阳²⁺, 李梅¹, 李雪³, 王勇⁴

1. 西北农林科技大学机械与电子工程学院, 陕西杨凌 712100
2. 西北农林科技大学信息工程学院, 陕西杨凌 712100
3. 昆士兰大学信息技术与电子工程系, 布里斯班 4072, 澳大利亚
4. 西北工业大学计算机学院, 西安 710072

通讯作者: 张阳

Abstract

Abstract: Integrating data stream mining algorithm with data stream management system (DSMS) is a novel challenge for data mining and database researchers. But the integration of very fast decision tree (VFDT) with data stream management has not been reported till now. This paper focuses on integrating VFDT algorithm with Esper by exploiting capabilities of data stream management system (DSMS). How to transform the algorithm into efficient Esper query language (EQL) is analyzed, and two implementations for integrating the popular VFDT algorithm with DSMS are proposed: Transforming the VFDT algorithm into EQL straightforwardly (denoted by DVFDT); an opti-mized version of DVFDT based on the inherent batch mode of Esper (denoted by optimal-DVFDT). The proposed implementations with VFDT based on Java (denoted by JVFDT) in terms of classification accuracy and performance are compared. Experiments on a set of large volume of synthetic data show the implementation works efficiently and accurately. In addition, this approach also has better performance for the sub-streams of the original data stream.

Key words: data stream management system (DSMS), very fast decision tree (VFDT) algorithm, integration, classification

摘要： 在数据流管理系统(data stream management system, DSMS)中嵌入数据挖掘算法对数据库研究者是一项新的挑战, 而在数据流管理系统中嵌入快速决策树(very fast decision tree, VFDT), 尚未见报道。利用DSMS原有的机制在Esper中实现了VFDT算法。其主要思想是将VFDT算法转换为Esper的数据查询语言(Esper query language, EQL)。给出了在DSMS中实现VFDT算法的两种方法：普通方法。直接将VFDT算法转化为EQL语言并在DSMS中实现(记作DVFDT); 改进方法。通过Esper中固有的批量处理模式来实现(记作optimal-DVFDT)。通过一系列实验比较分析了两种方法对海量数据流分类的准确率和性能; 将提出的两种方法与用Java实现的VFDT算法(记作JVFDT)在分类精度和时间上进行比较。结果表明, 在DSMS中实现的VFDT算法具有较好的性能, 并且该算法对大规模数据流数据的子集同样具有较高的性能。

关键词: 数据管理系统, VFDT算法, 嵌入, 分类

CLC Number:

TP181

YUAN Lei¹, ZHANG Yang²⁺, LI Mei¹, LI Xue³, WANG Yong⁴. Programming the VFDT Algorithm in Data Stream Manage-ment System*[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(8): 673-682.

袁磊1 , 张阳2+ , 李梅1 , 李雪3 , 王勇4 . 在数据流管理系统中实现快速决策树算法*[J]. 计算机科学与探索, 2010, 4(8): 673-682.

[1]	RONG Huan, MA Tinghuai. Two-Phase Crowdsourced Comment Integration Method Based on Reward Prediction and Policy Gradient [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1476-1489.
[2]	ZHANG Mengqian, ZHANG Li. Coarse-to-Fine Two-Stage Convolutional Neural Network Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1501-1510.
[3]	LIU Gang, GE Hongwei. Research on Initialization Algorithm for Visual-Inertial SLAM System [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1546-1554.
[4]	ZHAO Zeyuan, DAI Yongqiang. Improved Shuffled Binary Grasshopper Optimization Feature Selection Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1339-1349.
[5]	LIU Jingyi, SHI Caijuan, TU Dongjing, LIU Shuai. Survey of Zero-Shot Image Classification [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(5): 812-824.
[6]	WANG Runzheng, GAO Jian, TONG Xin, YANG Mengqi. Research on Malicious Code Family Classification Combining Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(5): 881-892.
[7]	ZHANG Yi, WANG Shitong. Extreme Learning Machine for Optimized Affine Transformation Based on Gaussian Distribution [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 690-701.
[8]	SHI Zhicheng, ZHOU Yu. Method of Code Features Automated Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 456-467.
[9]	REN Jianhua, LI Jing, MENG Xiangfu. Document Classification Method Based on Context Awareness and Hierarchical Attention Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 305-314.
[10]	YANG Zhangjing, WANG Wenbo, HUANG Pu, ZHANG Fanlong. Denoising Latent Subspace Based Subspace Learning for Image Classification [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2374-2389.
[11]	SHU Shitai, LI Song, HAO Xiaohong, ZHANG Liping. Knowledge Graph Embedding Technology: A Review [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(11): 2048-2062.
[12]	AN Ping, JI Zhong, LIU Xiyao. Task-Aware Dual Prototypical Network for Few-Shot Human-Object Interaction Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(11): 2184-2192.
[13]	MA Xiang, DENG Zhaohong, WANG Shitong. Multi-grained Fusion Image Feature Learning with Fuzzy Rule System [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(1): 173-184.
[14]	YANG Chen, SONG Xiaoning, SONG Wei. SentiBERT: Pre-training Language Model Combining Sentiment Information [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1563-1570.
[15]	REN Jiadong, WANG Qian, WANG Fei, LI Yazhou, LIU Jiaxin. Automatic Classification of Computer Vulnerability Based on S-C Feature Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(7): 1173-1182.

Programming the VFDT Algorithm in Data Stream Manage-ment System*

在数据流管理系统中实现快速决策树算法*

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics