计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (5): 385-397.

• 学术研究 • 上一篇    下一篇

MapReduce框架下的Skyline计算

张波良1,2, 周水庚1,2, 关佶红3   

  1. 1. 复旦大学 计算机科学技术学院, 上海 200433
    2. 复旦大学 上海市智能信息处理重点实验室, 上海 200433
    3. 同济大学 计算机科学与技术系, 上海 201804
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-05-01 发布日期:2011-05-01

Skyline Computation under MapReduce Framework

ZHANG Boliang1,2, ZHOU Shuigeng1,2, GUAN Jihong3   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China
    2. Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China 3. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-05-01 Published:2011-05-01

摘要: 由于Skyline查询广泛应用于多目标决策、数据可视化等领域, 近年来成为数据库领域的一个研究热点。针对云计算环境, 在MapReduce框架下设计并实现了Skyline算法。MapReduce是一个运行在大型集群上处理海量数据的并行计算框架, 其主要思想是任务的分解与结果的汇总。基于不同的数据划分思想, 实施了三种Skyline并行算法, 分别是基于MapReduce的块嵌套循环算法(MapReduce based block-nested- loops, MR-BNL)、基于MapReduce的排序过滤算法(MapReduce based sort-filter-skyline, MR-SFS)以及基于MapReduce的位图算法(MapReduce based bitmap, MR-Bitmap), 并针对这三种算法进行了系统的实验比较, 得出了不同数据分布、维数、缓存等因素对算法性能的影响结果。

关键词: Skyline计算, 云计算, MapReduce, 数据划分

Abstract: Skyline computation, due to its wide applications in multi-objective decision making and data visualization, has attracted many research interests in database community recently. Aiming at cloud computing applications, this paper addresses the problem of Skyline computation under the MapReduce framework. As a parallel programming model for data-intensive computing applications, MapReduce runs on a cluster of commercial PCs with the main idea of task decomposition and solution reduction. Based on different data division strategies, this paper proposes three algorithms: MapReduce based block-nested-loops (MR-BNL), MapReduce based sort-filter-skyline (MR-SFS) and MapReduce based bitmap (MR-Bitmap). It conducts extensive experiments to evaluate and compare the three algorithms under different situations of different data distributions, dimensions and buffer sizes.

Key words: Skyline computation, cloud computing, MapReduce, data division