计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (11): 973-982.DOI: 10.3778/j.issn.1673-9418.1305053

• 学术研究 • 上一篇    下一篇

高度可伸缩的稀疏矩阵乘法

吴志川,毛  琛,韩  蕾,陈立军+   

  1. 北京大学 信息科学技术学院 计算机系,北京 100871
  • 出版日期:2013-11-01 发布日期:2013-11-04

Highly Scalable Sparse Matrix Multiplication

WU Zhichuan, MAO Chen, HAN Lei, CHEN Lijun+   

  1. Department of Computer Science and Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
  • Online:2013-11-01 Published:2013-11-04

摘要: 矩阵乘法是线性代数和图算法中非常重要的一个基本操作,而大规模数据处理中的矩阵往往是稀疏矩阵。MapReduce编程框架能够有效地支持海量数据的分布式计算。因此,对如何运用MapReduce编程框架实现超大规模稀疏矩阵的乘法进行了研究。传统矩阵乘法并行算法没有针对稀疏矩阵进行专门优化,导致计算过程中出现大量不必要的通信开销。提出了一种新的算法——CRM(column row multiplication)算法,并与传统的矩阵分块算法进行了比较。实验证明,CRM算法运行效率有很大的提高,并且具有高度的可伸缩性,适合在MapReduce平台上运行。

关键词: 稀疏矩阵乘法, 分布式计算, Hadoop

Abstract: Matrix multiplication is an important fundamental operation in algebra and graph algorithms. And matrixes are usually highly sparse when coming to massive data processing. MapReduce is a programming model which can process large data sets effectively. This paper focuses on how to deal with massive sparse matrix multiplication on top of MapReduce programming model. Block based matrix multiplication algorithms aren’t optimized for sparse matrix and produce large amount of redundant communication. This paper proposes a new algorithm named CRM (column row multiplication), and compares it with traditional block based matrix algorithms. The experimental results demonstrate that CRM has higher efficiency and scalability, is suitable for operating on MapReduce and outperforms traditional ways considerably.

Key words: sparse matrix multiplication, distributed computing, Hadoop