MapReduce环境下的并行Dwarf立方构建

计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (5): 398-409.

MapReduce环境下的并行Dwarf立方构建

师金钢, 鲍玉斌, 冷芳玲, 于戈

东北大学信息科学与工程学院, 沈阳 110819

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-05-01 发布日期:2011-05-01

Efficient Parallel Dwarf Data Cube Using MapReduce

SHI Jingang, BAO Yubin, LENG Fangling, YU Ge

College of Information Science and Engineering, Northeastern University, Shenyang 110819, China

Received:1900-01-01 Revised:1900-01-01 Online:2011-05-01 Published:2011-05-01

摘要/Abstract

摘要： 针对数据密集型应用, 提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法。算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方, 采用MapReduce架构, 实现了Dwarf立方的并行构建、查询和更新。实验证明, 并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性, 另一方面结合了Dwarf立方结构的数据高压缩性及数据自索引性。并行Dwarf立方既实现了数据立方的高压缩存储, 提供了快速的构造和增量更新操作, 又克服了MapReduce机制没有索引的劣势, 实现了数据立方上的快速查询操作。

关键词: 数据密集计算, MapReduce, Dwarf, 数据立方

Abstract: In the data-intensive computing, this paper proposes an efficient parallel Dwarf data cube construction algorithm using MapReduce framework. The algorithm divides the traditional Dwarf cube into several independent sub-Dwarf cubes, and then achieves parallel building, querying and updating of Dwarf cube by using MapReduce framework. Finally, experiments show that the parallel Dwarf algorithm not only combines the parallelism and scalability of MapReduce framework, but also combines the high compression and the self-indexing of the data of Dwarf cube structure. The parallel Dwarf algorithm not only achieves the high compression ratio for data storage and provides the rapid construction and incremental update operation, but also overcomes the disadvantage that there is no index in MapReduce mechanisms, to achieve the fast query on the data cube.

Key words: data-intensive computing, MapReduce, Dwarf, data cube

师金钢, 鲍玉斌, 冷芳玲, 于戈. MapReduce环境下的并行Dwarf立方构建[J]. 计算机科学与探索, 2011, 5(5): 398-409.

SHI Jingang, BAO Yubin, LENG Fangling, YU Ge. Efficient Parallel Dwarf Data Cube Using MapReduce[J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(5): 398-409.

[1]	张敬伟，尚宏佳，钱俊彦，周萍，杨青. 非均匀数据分布下的MapReduce连接查询算法优化[J]. 计算机科学与探索, 2017, 11(5): 752-767.
[2]	郭心宇，岳昆，李劲，武浩，张彬彬. 面向评价数据中用户偏好发现的证据理论方法[J]. 计算机科学与探索, 2017, 11(2): 231-241.
[3]	王泽奥，吴斌，吴心宇，张子兴. 大规模多维网络数据分析框架的研究与实现[J]. 计算机科学与探索, 2017, 11(12): 1941-1952.
[4]	李东，邓泽航，李祖立. 基于MapReduce的XML结构连接处理[J]. 计算机科学与探索, 2016, 10(8): 1080-1091.
[5]	胡志刚，景冬梅，陈柏林，杨柳. 基于Hadoop平台的语义数据查询策略研究[J]. 计算机科学与探索, 2016, 10(7): 948-958.
[6]	单观敏，董一鸿，何贤芒. 基于MapReduce的连续概率Skyline查询[J]. 计算机科学与探索, 2016, 10(2): 182-193.
[7]	尹子都，岳昆，武浩，付晓东，刘惟一. 基于记忆曲线的数据密集型动态用户行为建模[J]. 计算机科学与探索, 2016, 10(10): 1376-1386.
[8]	张安珍，门雪莹，王宏志，李建中，高宏. 大数据上基于Hadoop的不一致数据检测与修复算法[J]. 计算机科学与探索, 2015, 9(9): 1044-1055.
[9]	刘超，徐雅斌，武装. 微博社区快速发现方法[J]. 计算机科学与探索, 2015, 9(9): 1100-1107.
[10]	蒋勇，赵作鹏. 基于MapReduce模型的排序算法优化研究[J]. 计算机科学与探索, 2015, 9(4): 410-417.
[11]	孙鹤立，陈强，刘玮，黄健斌，邹建华. 利用MapReduce平台实现高效并行的频繁子图挖掘[J]. 计算机科学与探索, 2014, 8(7): 790-801.
[12]	燕彩蓉，张洋舜，徐光伟. 支持隐私保护的众包实体解析[J]. 计算机科学与探索, 2014, 8(7): 802-811.
[13]	师金钢，郑艳，孙焕良，栾方军. 云环境中海量数据的并行分组密码体制研究[J]. 计算机科学与探索, 2014, 8(2): 161-170.
[14]	刘恒，寇月，申德荣，王泰明，于戈. 基于随机游走路径的分布式SimRank算法[J]. 计算机科学与探索, 2014, 8(12): 1422-1431.
[15]	王梅，邢露露，孙莉. 混合存储下的MapReduce启发式多表连接优化[J]. 计算机科学与探索, 2014, 8(11): 1334-1344.