计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (2): 182-193.DOI: 10.3778/j.issn.1673-9418.1506006

• 数据库技术 • 上一篇    下一篇

基于MapReduce的连续概率Skyline查询

单观敏,董一鸿+,何贤芒   

  1. 宁波大学 信息科学与工程学院,浙江 宁波 315211
  • 出版日期:2016-02-01 发布日期:2016-02-03

Continuous Probabilistic Skyline Query Based on MapReduce

SHAN Guanmin, DONG Yihong+, HE Xianmang   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
  • Online:2016-02-01 Published:2016-02-03

摘要: 大数据对传统的Skyline研究产生了挑战,利用并行框架MapReduce计算大数据下的Skyline已成为一个研究热点。研究了不确定移动对象的Skyline查询问题,提出了一种MapReduce框架下基于事件跟踪的连续概率Skyline查询算法——MR-DTrack(domination-track algorithm based on MapReduce)。首先采用基于角度的划分方法保证负载均衡, 通过预计算获取Skyline集可能变化的时刻, 在Reduce阶段获取候选概率Skyline集;然后利用局部过滤点剪枝,减少计算开销;最后合并计算出全局概率Skyline集。在人工数据集和真实数据集上的实验验证了算法的有效性。

关键词: MapReduce, Hadoop, 概率Skyline, 不确定移动对象

Abstract: As big data has been a challenge to traditional Skyline research, computing Skyline using parallel framework of MapReduce is now a research hotspot. This paper studies a Skyline query of an uncertain moving object and proposes a continuous probabilistic Skyline query algorithm based on event tracking, named MR-DTrack (domination-track algorithm based on MapReduce). Firstly, partitioning method based on angular is adopted to make workload balance, a pre-computation is used to get the time when the Skyline sets change possibly, and the candidate probabilistic Skyline sets can be got in the Reduce stage. Then local filter points are used to prune in order to reduce computing costs. Finally, the global probabilistic Skyline set is computed by combining the candidate skyline sets. Experiments over artificial and real data sets prove the efficiency and effective of the new algorithm.

Key words: MapReduce, Hadoop, probabilistic Skyline, uncertain moving object