基于Pregel模型的分布式图着色算法

doi:10.3778/j.issn.1673-9418.1709036

计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (6): 886-897.DOI: 10.3778/j.issn.1673-9418.1709036

基于Pregel模型的分布式图着色算法

甘瀛1,2，王鑫1,2+，冯志勇2,3，杨雅君1,2

1. 天津大学计算机科学与技术学院，天津 300354
2. 天津市认知计算与应用重点实验室，天津 300354
3. 天津大学软件学院，天津 300354

出版日期:2018-06-01 发布日期:2018-06-06

Distributed Graph Coloring Algorithm Based on Pregel Model

GAN Ying1,2, WANG Xin1,2+, FENG Zhiyong2,3, YANG Yajun1,2

1. School of Computer Science and Technology, Tianjin University, Tianjin 300354, China
2. Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300354, China
3. School of Computer Software, Tianjin University, Tianjin 300354, China

Online:2018-06-01 Published:2018-06-06

摘要/Abstract

摘要： 图着色问题一直是计算机科学和数学领域最著名和经典的研究问题之一。由于目前图数据规模的不断增加，单机图着色算法性能受到限制。现有的分布式图着色算法大多基于共享内存的消息传递模型，而无共享Pregel计算模型的提出与发展提高了大规模图数据的处理能力，其已成为现今大数据处理的主流框架之一，但尚缺少将现有的分布式图着色算法适配到Pregel模型进行算法研究与实验比较的工作。为了提高图着色算法的性能，受经典图着色算法MIS（maximal-independent-set）启发，设计了一种基于Pregel模型的分布式图着色算法MIS-Pregel。结合着色时间和所需颜色数等方面提出了两种不同的优化策略，第一种优化策略基于JP算法，第二种优化策略基于LDF算法。在实现了主流图数据处理模型Pregel的Spark GraphX框架下开发了上述MIS-Pregel算法和两种改进算法JP-Pregel和LDF-Pregel。在合成数据集和真实数据集上进行了实验，大量实验结果表明所提分布式图着色算法能够高效地完成图着色任务，且JP-Pregel算法和LDF-Pregel算法的着色时间比MIS-Pregel算法分别平均缩短了26.4%和30.9%。

关键词: 分布式图着色, Pregel模型, Spark, GraphX

Abstract: The graph coloring problem is one of the most famous and classical research questions in the field of computer science and mathematics. With the increasing of data scale, the performance of graph coloring algorithms is limited. And existing distributed graph coloring algorithms are mostly based on shared-memory message passing model. However, the development of Pregel model that has a share-nothing architecture has enhanced the data processing capability, and it has been the key technology for large-scale graph-data processing. But there is no related work to improve the existing distributed graph coloring algorithms to adapt share-nothing Pregel model and make an algorithm research and experimental comparison. In order to improve the performance of graph coloring algorithms, inspired by the classical graph coloring algorithm MIS (maximal-independent-set), this paper devises a distributed graph coloring algorithm MIS-Pregel based on the Pregel model. Then, this paper proposes two strategies to optimize the time for coloring and total number of colors, the first optimization strategy is based on the JP algorithm, and the second optimization strategy is based on the LDF algorithm. This paper implements the basic algorithm MIS-Pregel and two optimized algorithms (JP-Pregel and LDF-Pregel) based on above optimization strategies on Spark GraphX. Finally, extensive experiments show that the proposed basic algorithm has high efficiency of coloring and the performance of the optimization algorithms is improved by 26.4% and 30.9% than the basic algorithm over both synthetic and real datasets.

Key words: distributed graph coloring, Pregel model, Spark, GraphX

甘瀛，王鑫，冯志勇，杨雅君. 基于Pregel模型的分布式图着色算法[J]. 计算机科学与探索, 2018, 12(6): 886-897.

GAN Ying, WANG Xin, FENG Zhiyong, YANG Yajun. Distributed Graph Coloring Algorithm Based on Pregel Model[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(6): 886-897.

[1]	王永贵，徐山珊，肖成龙. 无线城市社团发现的研究——在Spark上利用改进关联规则实现社团发现的算法[J]. 计算机科学与探索, 2019, 13(9): 1582-1592.
[2]	郭羽含，胡芳霞. 考虑匹配可行性的长期合乘问题建模与求解[J]. 计算机科学与探索, 2019, 13(11): 1894-1910.
[3]	张晓琳，何晓玉，张换香，李卓麟. PLRD-(k,m):保护链接关系的分布式k-度-m-标签匿名方法[J]. 计算机科学与探索, 2019, 13(1): 70-82.
[4]	邱慧，邹兆年. Spark GraphX上的SPARQL查询处理算法[J]. 计算机科学与探索, 2018, 12(9): 1361-1371.
[5]	李勇，滕飞，黄齐川，李天瑞. 基于Spark的时间序列并行分解模型[J]. 计算机科学与探索, 2018, 12(7): 1055-1063.
[6]	张云飞，李劲，岳昆，罗之皓，刘惟一. 关联影响力传播最大化方法[J]. 计算机科学与探索, 2018, 12(12): 1891-1902.
[7]	时生乐，赵宇海，李源，印莹，王国仁. 一种有效的基于GraphX的分布式结构化图聚类算法[J]. 计算机科学与探索, 2018, 12(10): 1571-1582.
[8]	邓诗卓，信俊昌，聂铁铮，王国仁. 双缀过滤的大数据相似性连接处理算法[J]. 计算机科学与探索, 2017, 11(8): 1235-1245.
[9]	韩超，段磊，邓松，王慧锋，唐常杰. 基于Spark的序列数据质量评价[J]. 计算机科学与探索, 2017, 11(6): 897-907.
[10]	王雯，赵衎衎，李翠平，陈红，孙辉. Spark平台下的短文本特征扩展与分类研究[J]. 计算机科学与探索, 2017, 11(5): 732-741.
[11]	王泽奥，吴斌，吴心宇，张子兴. 大规模多维网络数据分析框架的研究与实现[J]. 计算机科学与探索, 2017, 11(12): 1941-1952.
[12]	方峰，蔡志平，肇启佳，林加润，朱明. 使用Spark Streaming的自适应实时DDoS检测和防御技术[J]. 计算机科学与探索, 2016, 10(5): 601-611.
[13]	刘志强，顾荣，袁春风，黄宜华. 基于SparkR的分类算法并行化研究[J]. 计算机科学与探索, 2015, 9(11): 1281-1294.

基于Pregel模型的分布式图着色算法

Distributed Graph Coloring Algorithm Based on Pregel Model

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 13

编辑推荐

Metrics