计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (7): 901-914.DOI: 10.3778/j.issn.1673-9418.1509009

• 学术研究 • 上一篇    下一篇

分布共享环境下支持弹性伸缩的图处理框架

詹杭龙1,2,曹东刚1,2+,谢  冰1,2   

  1. 1. 北京大学 高可信软件技术教育部重点实验室,北京 100871
    2. 北京大学(天津滨海) 新一代信息技术研究院,天津 300450
  • 出版日期:2016-07-01 发布日期:2016-07-01

Graph Processing Framework Supporting Elastic Scalability in Distributed Shared Environment

ZHAN Hanglong1,2, CAO Donggang1,2+, XIE Bing1,2   

  1. 1. Key Lab of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871, China
    2. Beida (Binhai) Information Research, Tianjing 300450, China
  • Online:2016-07-01 Published:2016-07-01

摘要: 作为大数据处理的一种重要模式,图处理被广泛地应用在机器学习、数据统计和数据挖掘等场景中。在企业级应用中,多种类型的大数据处理框架通常会部署在同一个分布式集群中,其运行环境是开放、共享的,这时图处理需要考虑运算资源动态变化的问题。为了能适应这种动态性,更加充分地利用开放共享环境的资源,图处理框架应该具备弹性伸缩能力。通过调研,发现现有的图处理框架尚未完全实现弹性伸缩。为此,介绍了一种支持弹性伸缩的分布式并行图处理框架SParTaG。首先基于任务并行模型定义了图处理任务集及任务模型;其次基于任务迁移机制设计并实现了可动态伸缩的图处理框架;最后设计了一个基于负载均衡的调度算法,实现了动态伸缩的图处理过程。实验结果说明,SParTaG的性能与当前流行的开源图处理框架Giraph相近,且具有较好的弹性伸缩能力。

关键词: 图处理, 分布式并行计算, 弹性伸缩, 任务迁移

Abstract: As an important pattern in big data processing, graph processing has been widely used in many kinds of scenarios, such as machine learning, data statistics and data mining, etc. when running enterprise-level applications, various kinds of big-data processing frameworks are usually deployed in the same distributed cluster, so the runtime environment is open and shared. As a result, graph processing should consider the dynamic changes of computing resources. In order to adapt to this dynamics and make good use of computing resources, graph processing framework should have the ability of elastic scaling. However, current graph processing frameworks have not fully realized elastic scaling yet as far as this paper knows. This paper introduces the design and implementation of an elastic scalable parallel graph processing framework, SParTaG. SParTaG firstly defines the task set and task model in graph processing problem; then designs an elastic scalable framework based on task migration mechanism; and proposes a load-balancing based scheduling algorithm at last. Experiments show that SParTaG achieves performance parity with the currently popular open-source Giraph system, and it can run graph job well in an elastic scalable manner.

Key words: graph processing, distributed parallel computing, elastic scaling, task migration