计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (6): 886-897.DOI: 10.3778/j.issn.1673-9418.1709036

• 数据库技术 • 上一篇    下一篇

基于Pregel模型的分布式图着色算法

甘  瀛1,2,王  鑫1,2+,冯志勇2,3,杨雅君1,2   

  1. 1. 天津大学 计算机科学与技术学院,天津 300354
    2. 天津市认知计算与应用重点实验室,天津 300354
    3. 天津大学 软件学院,天津 300354
  • 出版日期:2018-06-01 发布日期:2018-06-06

Distributed Graph Coloring Algorithm Based on Pregel Model

GAN Ying1,2, WANG Xin1,2+, FENG Zhiyong2,3, YANG Yajun1,2   

  1. 1. School of Computer Science and Technology, Tianjin University, Tianjin 300354, China
    2. Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300354, China
    3. School of Computer Software, Tianjin University, Tianjin 300354, China
  • Online:2018-06-01 Published:2018-06-06

摘要: 图着色问题一直是计算机科学和数学领域最著名和经典的研究问题之一。由于目前图数据规模的不断增加,单机图着色算法性能受到限制。现有的分布式图着色算法大多基于共享内存的消息传递模型,而无共享Pregel计算模型的提出与发展提高了大规模图数据的处理能力,其已成为现今大数据处理的主流框架之一,但尚缺少将现有的分布式图着色算法适配到Pregel模型进行算法研究与实验比较的工作。为了提高图着色算法的性能,受经典图着色算法MIS(maximal-independent-set)启发,设计了一种基于Pregel模型的分布式图着色算法MIS-Pregel。结合着色时间和所需颜色数等方面提出了两种不同的优化策略,第一种优化策略基于JP算法,第二种优化策略基于LDF算法。在实现了主流图数据处理模型Pregel的Spark GraphX框架下开发了上述MIS-Pregel算法和两种改进算法JP-Pregel和LDF-Pregel。在合成数据集和真实数据集上进行了实验,大量实验结果表明所提分布式图着色算法能够高效地完成图着色任务,且JP-Pregel算法和LDF-Pregel算法的着色时间比MIS-Pregel算法分别平均缩短了26.4%和30.9%。

关键词: 分布式图着色, Pregel模型, Spark, GraphX

Abstract: The graph coloring problem is one of the most famous and classical research questions in the field of computer science and mathematics. With the increasing of data scale, the performance of graph coloring algorithms is limited. And existing distributed graph coloring algorithms are mostly based on shared-memory message passing model. However, the development of Pregel model that has a share-nothing architecture has enhanced the data processing capability, and it has been the key technology for large-scale graph-data processing. But there is no related work to improve the existing distributed graph coloring algorithms to adapt share-nothing Pregel model and make an algorithm research and experimental comparison. In order to improve the performance of graph coloring algorithms, inspired by the classical graph coloring algorithm MIS (maximal-independent-set), this paper devises a distributed graph coloring algorithm MIS-Pregel based on the Pregel model. Then, this paper proposes two strategies to optimize the time for coloring and total number of colors, the first optimization strategy is based on the JP algorithm, and the second optimization strategy is based on the LDF algorithm. This paper implements the basic algorithm MIS-Pregel and two optimized algorithms (JP-Pregel and LDF-Pregel) based on above optimization strategies on Spark GraphX. Finally, extensive experiments show that the proposed basic algorithm has high efficiency of coloring and the performance of the optimization algorithms is improved by 26.4% and 30.9% than the basic algorithm over both synthetic and real datasets.

Key words: distributed graph coloring, Pregel model, Spark, GraphX