计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (5): 398-409.

• 学术研究 • 上一篇    下一篇

MapReduce环境下的并行Dwarf立方构建

师金钢, 鲍玉斌, 冷芳玲, 于 戈   

  1. 东北大学 信息科学与工程学院, 沈阳 110819
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-05-01 发布日期:2011-05-01

Efficient Parallel Dwarf Data Cube Using MapReduce

SHI Jingang, BAO Yubin, LENG Fangling, YU Ge   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-05-01 Published:2011-05-01

摘要: 针对数据密集型应用, 提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法。算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方, 采用MapReduce架构, 实现了Dwarf立方的并行构建、查询和更新。实验证明, 并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性, 另一方面结合了Dwarf立方结构的数据高压缩性及数据自索引性。并行Dwarf立方既实现了数据立方的高压缩存储, 提供了快速的构造和增量更新操作, 又克服了MapReduce机制没有索引的劣势, 实现了数据立方上的快速查询操作。

关键词: 数据密集计算, MapReduce, Dwarf, 数据立方

Abstract: In the data-intensive computing, this paper proposes an efficient parallel Dwarf data cube construction algorithm using MapReduce framework. The algorithm divides the traditional Dwarf cube into several independent sub-Dwarf cubes, and then achieves parallel building, querying and updating of Dwarf cube by using MapReduce framework. Finally, experiments show that the parallel Dwarf algorithm not only combines the parallelism and scalability of MapReduce framework, but also combines the high compression and the self-indexing of the data of Dwarf cube structure. The parallel Dwarf algorithm not only achieves the high compression ratio for data storage and provides the rapid construction and incremental update operation, but also overcomes the disadvantage that there is no index in MapReduce mechanisms, to achieve the fast query on the data cube.

Key words: data-intensive computing, MapReduce, Dwarf, data cube