计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (10): 877-887.DOI: 10.3778/j.issn.1673-9418.2012.10.002

• 学术研究 • 上一篇    下一篇

混合MapReduce环境下大数据划分的查询优化

李  伏,朱  青+   

  1. 中国人民大学 信息学院 计算机系,北京 100872
  • 出版日期:2012-10-01 发布日期:2012-09-28

Query Optimization of Big Data Partition in Hybrid MapReduce System

LI Fu, ZHU Qing+   

  1. Department of Computer Science, School of Information, Renmin University of China, Beijing 100872, China
  • Online:2012-10-01 Published:2012-09-28

摘要: 在MapReduce与数据库的混合架构中,数据划分是影响查询性能的重要因素。对于开销最大的连接和聚集操作,采用混合MapReduce的方式实现,需要大规模数据的跨结点传输,网络传输和I/O开销巨大。为了减少传输的数据量,并提高连接操作的查询效率,提出了划分建议器模型。实现了MapReduce和数据库混合架构上的划分建议器,并计算划分代价,生成最优的数据划分方案,提高了系统效率。为了减少查询时间,依据划分建议器模型,提出了基于代价优先的生成策略和空间搜索算法,减少了划分建议器生成最优方案的时间。通过实验验证了划分建议器的有效性,使系统的整体查询代价最小,显著提高了系统性能。

关键词: 混合架构, 查询优化, 划分建议器

Abstract: Data partition is the important factor with influencing query performance in hybrid architecture of integrating MapReduce and database. For Join and Group operations with larger query cost, they are implemented in need of expensive cost of network transmission and I/O in hybrid MapReduce system because of large-scale data transmission across the nodes. In order to reduce data transmission and improve operation efficiency of Join query, this paper puts forward partition recommender. Firstly, it proposes and realizes the partition recommender for hybrid architecture, which calculates the query cost to generate the optimal partition solution and improve the efficiency of the system. Secondly, it proposes a priority-based generation strategy and a space-pruning search algorithm to decrease calculating time of the optimal partition solution. Finally, the experiments verify effectively the partition recommender, which makes query cost minimum and improves query performance of hybrid MapReduce architecture.

Key words: hybrid architecture, query optimization, partition recommender