计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (5): 546-554.DOI: 10.3778/j.issn.1673-9418.1405049

• 学术研究 • 上一篇    下一篇

云环境下分层的中间数据容错方法

宋宝燕,李雪城,任  才,丁琳琳+   

  1. 辽宁大学 信息学院,沈阳 110036
  • 出版日期:2015-05-01 发布日期:2015-05-06

Layered Intermediate Data Fault-Tolerance Approach in Cloud

SONG Baoyan, LI Xuecheng, REN Cai, DING Linlin+   

  1. School of Information, Liaoning University, Shenyang 110036, China
  • Online:2015-05-01 Published:2015-05-06

摘要: 通常在云计算框架的处理过程中会产生大量的、短暂的,同时又非常重要的中间数据。一旦有服务器失效,将会导致中间数据失效,进而影响整个任务的计算。现有的数据容错处理方法仅仅采用简单的复制策略,没有考虑中间数据的特点,会带来庞大的网络开销。因此,提出了一种有效的分层中间数据容错方法,即IDF_Support(intermediate data fault-tolerance_support)方法。通过将计算任务划分为不同类别,IDF_Support方法能够有效地处理中间数据失效。提出了分层的中间数据容错算法,分别是用于解决一个任务内部容错的中间数据容错算法(Inner_Task IDF)和用于解决任务间容错的中间数据容错算法(Outer_Task IDF)。实验结果表明,这些算法在机器出现故障的情况下提高了作业响应时间,保证了系统的可靠性。

关键词: 云计算, 中间数据, 副本, 容错算法

Abstract: Cloud computing frameworks usually generate large amounts of intermediate data which are short-lived, yet are important for the completion of job. Once there are server failures, it will lead to the failures of intermediate data, and affects the computation of the whole job. However, the existing fault-tolerant processing approaches only adopt simple replication strategies which can incur significant network overhead, and have no considering of the characteristics of intermediate data. Therefore, this paper proposes an efficient layered intermediate data fault-tolerant approach, named IDF_Support (intermediate data fault-tolerance_support) approach. By dividing the computing tasks into different classifications, IDF_Support approach can effectively process the intermediate data failures. Then, this paper proposes two layered intermediate data fault-tolerant algorithms, respectively the inner task intermediate data fault-tolerant algorithm (Inner_Task IDF) which resolves the fault-tolerance within a task and the outer task intermediate data fault-tolerant algorithm (Outer_Task IDF) which resolves the fault-tolerance among tasks. The experimental results show that the proposed algorithms can improve the response time in the case of machine failure, and keep the reliability of the whole system.

Key words: cloud computing, intermediate data, replication, fault-tolerant algorithm