计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (11): 1767-1776.DOI: 10.3778/j.issn.1673-9418.1710032

• 系统软件与软件工程 • 上一篇    下一篇

支持SDN的Hadoop中的时间最小化任务调度

孙怀英,虞慧群,范贵生,陈丽琼   

  1. 1. 华东理工大学 计算机科学与工程系,上海 200237
    2. 上海市计算机软件评测重点实验室,上海 201112
    3. 上海应用技术大学 计算机科学与信息工程系,上海 200235
  • 出版日期:2018-11-01 发布日期:2018-11-12

Time Minimized Task Scheduling in Hadoop with SDN

SUN Huaiying, YU Huiqun, FAN Guisheng, CHEN Liqiong   

  1. 1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
    2. Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai 201112, China
    3. Department of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 200235, China
  • Online:2018-11-01 Published:2018-11-12

摘要:

软件定义网络(software defined network,SDN)是一种能将基础设备的网络控制功能分离并集中地部署到控制器中的网络架构。实际的Hadoop系统中,存在一个最小化作业完成时间的NP完全问题。在Hadoop中引入SDN,利用SDN的网络控制能力,将网络中的可用剩余带宽作为任务调度的重要参数,并提出任务调度算法RBA(residual bandwidth based algorithm)。使用RBA可获得任务的近似最优分配方案,从而实现作业完成时间的最小化。通过仿真实验验证RBA在作业完成时间、任务数据本地性及计算时间方面的性能。实验结果表明,总体上RBA较HDS、BAR、BASS算法是更优的。

关键词: 软件定义网络(SDN), Hadoop, 带宽, 任务调度

Abstract:

Software defined network (SDN) is a network architecture which can separate out the network control functions in the infrastructures and centrally deploy them into a controller. In real-word Hadoop system, there is an NP-complete problem of minimizing the job completion time. This paper combines Hadoop with SDN. With the network control ability of SDN, the available residual bandwidth of the network can be gained as an significant parameter of task scheduling. According to this, a task scheduling algorithm RBA (residual bandwidth based algorithm) is proposed, which can get the approximate optimal allocation schemes for tasks in a job, achieving the goal of minimizing the job completion time. Several simulation experiments are conducted to verify the performance of RBA in terms of job completion time, task data locality and computation time. Experimental results show that RBA is generally better than HDS, BAR, BASS algorithms.

Key words: software defined network (SDN), Hadoop, bandwidth, task scheduling