Efficient Parallel Dwarf Data Cube Using MapReduce

Journal of Frontiers of Computer Science and Technology ›› 2011, Vol. 5 ›› Issue (5): 398-409.

• 学术研究 • Previous Articles Next Articles

Efficient Parallel Dwarf Data Cube Using MapReduce

SHI Jingang, BAO Yubin, LENG Fangling, YU Ge

College of Information Science and Engineering, Northeastern University, Shenyang 110819, China

Received:1900-01-01 Revised:1900-01-01 Online:2011-05-01 Published:2011-05-01

MapReduce环境下的并行Dwarf立方构建

师金钢, 鲍玉斌, 冷芳玲, 于戈

东北大学信息科学与工程学院, 沈阳 110819

Abstract

Abstract: In the data-intensive computing, this paper proposes an efficient parallel Dwarf data cube construction algorithm using MapReduce framework. The algorithm divides the traditional Dwarf cube into several independent sub-Dwarf cubes, and then achieves parallel building, querying and updating of Dwarf cube by using MapReduce framework. Finally, experiments show that the parallel Dwarf algorithm not only combines the parallelism and scalability of MapReduce framework, but also combines the high compression and the self-indexing of the data of Dwarf cube structure. The parallel Dwarf algorithm not only achieves the high compression ratio for data storage and provides the rapid construction and incremental update operation, but also overcomes the disadvantage that there is no index in MapReduce mechanisms, to achieve the fast query on the data cube.

Key words: data-intensive computing, MapReduce, Dwarf, data cube

摘要： 针对数据密集型应用, 提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法。算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方, 采用MapReduce架构, 实现了Dwarf立方的并行构建、查询和更新。实验证明, 并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性, 另一方面结合了Dwarf立方结构的数据高压缩性及数据自索引性。并行Dwarf立方既实现了数据立方的高压缩存储, 提供了快速的构造和增量更新操作, 又克服了MapReduce机制没有索引的劣势, 实现了数据立方上的快速查询操作。

关键词: 数据密集计算, MapReduce, Dwarf, 数据立方

SHI Jingang, BAO Yubin, LENG Fangling, YU Ge. Efficient Parallel Dwarf Data Cube Using MapReduce[J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(5): 398-409.

师金钢, 鲍玉斌, 冷芳玲, 于戈. MapReduce环境下的并行Dwarf立方构建[J]. 计算机科学与探索, 2011, 5(5): 398-409.

[1]	ZHANG Jingwei, SHANG Hongjia, QIAN Junyan, ZHOU Ping, YANG Qing. Join Query Optimization Based on MapReduce under Skewed Data [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 752-767.
[2]	GUO Xinyu, YUE Kun, LI Jin, WU Hao, ZHANG Binbin. Evidence-Theory Approach for Discovering User Preferences in Rating Data [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(2): 231-241.
[3]	LI Dong, DENG Zehang, LI Zuli. Structural Join Processing for XML Based on MapReduce [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(8): 1080-1091.
[4]	HU Zhigang, JING Dongmei, CHEN Bailin, YANG Liu. Research on Semantic Data Query Method Based on Hadoop [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 948-958.
[5]	SHAN Guanmin, DONG Yihong, HE Xianmang. Continuous Probabilistic Skyline Query Based on MapReduce [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(2): 182-193.
[6]	YIN Zidu, YUE Kun, WU Hao, FU Xiaodong, LIU Weiyi. Data Intensive Modeling of Dynamic User Behaviors Based on Forgetting Curve [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(10): 1376-1386.
[7]	ZHANG Anzhen, MEN Xueying, WANG Hongzhi, LI Jianzhong, GAO Hong. Hadoop-Based Inconsistence Detection and Reparation Algorithm for Big Data [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(9): 1044-1055.
[8]	LIU Chao, XU Yabin, WU Zhuang. Method for Rapid Detecting Micro-Blog Communities [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(9): 1100-1107.
[9]	JIANG Yong, ZHAO Zuopeng. Research on Optimization of Sorting Algorithm Based on MapReduce [J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(4): 410-417.
[10]	SUN Heli, CHEN Qiang, LIU Wei, HUANG Jianbin, ZOU Jianhua. Using MapReduce Platform to Achieve Efficient Parallel Mining of Frequent Subgraphs [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(7): 790-801.
[11]	YAN Cairong, ZHANG Yangshun, XU Guangwei. Crowdsourcing Entity Resolution with Privacy Protection [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(7): 802-811.
[12]	SHI Jingang, ZHENG Yan, SUN Huanliang, LUAN Fangjun. Parallel Processing of Block Cipher for Massive Data in Cloud Computing [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(2): 161-170.
[13]	LIU Heng, KOU Yue, SHEN Derong, WANG Taiming, YU Ge. Distributed SimRank Algorithm Based on Random Walk Path [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(12): 1422-1431.
[14]	WANG Mei, XING Lulu, SUN Li. MapReduce Based Heuristic Multi-Join Optimization under Hybrid Storage [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(11): 1334-1344.
[15]	JIN Pengfei, CAO Han, YU Jing, CUI Yunfei. Raster Generation of Voronoi Diagram under MapReduce [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(2): 160-168.

Efficient Parallel Dwarf Data Cube Using MapReduce

MapReduce环境下的并行Dwarf立方构建

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics