计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (4): 410-417.DOI: 10.3778/j.issn.1673-9418.1409083

• 学术研究 • 上一篇    下一篇

基于MapReduce模型的排序算法优化研究

蒋  勇1+,赵作鹏2   

  1. 1. 江苏联合职业技术学院 信息技术系,江苏 徐州 221008
    2. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221008
  • 出版日期:2015-04-01 发布日期:2015-04-02

Research on Optimization of Sorting Algorithm Based on MapReduce

JIANG Yong1+, ZHAO Zuopeng2   

  1. 1. Department of Information Technology, Jiangsu Union Technical Institute, Xuzhou, Jiangsu 221008, China
    2. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221008, China
  • Online:2015-04-01 Published:2015-04-02

摘要: MapReduce已经发展成为大数据领域标准的并行计算模型。为了使MapReduce系统下参与计算的所有节点高度负载均衡,并且最小化空间使用率、CPU、I/O的使用时长和网络传输开销等指标,在保持算法良好并行性的基础上,提出了一种MapReduce优化算法的设计规范,对多个指标同时进行优化。针对数据处理领域最重要的排序算法进行理论分析,给出了多指标约束下的最优算法,并证明了该优化算法满足MapReduce优化算法规范。最后通过实验验证了该优化的排序算法在有效性和效率方面严格优于传统的排序算法。

关键词: MapReduce, 优化算法, 大数据, 排序算法

Abstract: MapReduce has become the standard parallel computing model on big data analysis. To balance highly the loading nodes in MapReduce system and minimize space usage, CPU, I/O operation time and network overhead, based on a good parallel algorithm, this paper proposes an optimization algorithm of MapReduce design specification, optimizing indexes at the same time. This paper also gives theoretical analysis for the most important sorting algorithm in data processing field, presents the optimal algorithm of multiple index constraints, and proves that the optimal algorithm meets the standard of MapReduce optimization algorithm. The experiments verify that this optimal sorting algorithm is better than the traditional sorting algorithm in terms of effectiveness and efficiency.

Key words: MapReduce, optimization algorithm, big data, sorting algorithm