面向申威众核处理器的LZMA并行算法设计与优化

doi:10.3778/j.issn.1673-9418.1909070

摘要/Abstract

摘要：

随着高性能计算和科学计算应用的发展，高性能计算集群系统传输、存储和处理的数据规模呈现爆炸式增长。对大规模数据进行高效的压缩，减少数据存储所需空间和传输所需的通信带宽，是提升高性能计算集群系统性能的关键之一。无损压缩算法中，LZMA算法具有较高的压缩率，但串行版本的LZMA算法压缩速率很慢。采用多核架构的处理器对无损压缩算法进行并行化，是提升压缩速率的一个研究方向。设计并实现了面向申威26010异构众核处理器并行化LZMA算法。结合申威异构众核处理器的特点，对LZMA算法存储空间需求、访存特性、热点函数等进行分析，基于Athread接口实现LZMA算法从核多线程并行，并对LDM地址空间进行细粒度的布局与优化以获得更好的缓存性能，实现DMA双缓冲的循环滑动窗口算法。测试结果表明，相较主核串行版本算法，并行LZMA算法在Silesia语料库基准测试集和大规模数据集中分别获得了4.1倍和5.3倍的最大加速比，获得了较好的加速效果。

关键词: 并行计算, 异构众核处理器, LZMA, 压缩算法

Abstract:

In recent years, the development of high-performance computing and scientific computing applications results in a huge explosion of data transmitted, stored, and processed by high-performance computing cluster systems. Under this circumstance, efficient compression of large-scale data is needed to improve the performance of high-performance computing cluster systems, which will reduce not only the space required for data storage, but also the communication bandwidth required for transmission. In lossless compression algorithms, LZMA (Lempel Ziv-Markov chain algorithm) has the high compression ratio, but the compression rate of LZMA algorithm in serial version is very slow. Lots of studies use parallel computing to promote the performance of lossless compression algorithms, taking advantage of multi-core architectures. This paper proposes a parallel design and optimization of LZMA based on the Sunway 26010 heterogeneous many-core processor. Combining with Sunway heterogeneous many-core processor’s features, several key factors affecting the performance of LZMA are analyzed, such as space requirements, memory access features, hotspot functions, etc. Based on the Athread interface, the sliding window algorithm of LZMA is reconstructed for the multi-thread parallel. LDM address space is fine-grained and optimized to achieve a better cache performance. Cyclic sliding window algorithm is also achieved using DMA double buffer. The test results show that using the Silesia Corpus benchmark, the final optimized LZMA algorithm achieves a maximum speedup of 4.1 times over the serial baseline implementation of the controller core, while on the big data benchmark speedup is 5.3 times.

Key words: parallel computing, heterogeneous many-core processors, Lempel Ziv-Markov chain algorithm (LZMA), compression algorithm

李秉政，黄高阳，许瑾晨. 面向申威众核处理器的LZMA并行算法设计与优化[J]. 计算机科学与探索, 2020, 14(9): 1501-1509.

LI Bingzheng, HUANG Gaoyang, XU Jinchen. Design and Optimization of Parallel LZMA for Many-Core Sunway Processor[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1501-1509.

参考文献

[1] Pankratius V, Jannesari A, Tichy W F. Parallelizing Bzip2: a case study in multicore software engineering[J]. IEEE Software, 2009, 26(6): 70-77.
[2] Gristwood T, Fineran P C, Everson L, et al. PigZ, a TetR/AcrR family repressor, modulates secondary metabolism via the expression of a putative four-component resistance-nodulation-cell-division efflux pump, ZrpADBC, in serratia sp. ATCC 39006[J]. Molecular Microbiology, 2010, 69(2):418-435.
[3] Patel R A, Zhang Y, Mak J, et al. Parallel lossless data com-pression on the GPU[C]//Proceedings of the 2012 Inno-vative Parallel Computing, San Jose, May 13-14, 2012. Pis-cataway: IEEE, 2012: 1-10.
[4] Wu L W, Storus M, Cross D. CUDA WUDA SHUDA: CUDA compression projects[R]. Stanford University, 2009.
[5] Gilchrist J. Parallel data compression with Bzip2[C]//Procee-dings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems, 2004: 559-564.
[6] Wright C. Hybrid programming fun: making Bzip2 parallel with MPICH2 & Pthreads on the cray XD1[C]//Proceedings of the 48th Cray User Group Meeting, Lugano, May 8-11, 2006. Eagan: Cray User Group, Inc., 2006: 78-84.
[7] Agostino S D. Lempel-Ziv data compression on parallel and distributed systems[J]. Algorithms, 2011, 4(3): 183-199.
[8] Wang X, Gan L, Xu J H, et al. PLZMA: a parallel data com-pression method for cloud computing[C]//LNCS 11336: Proc-eedings of the 18th International Conference on Algorithms and Architectures for Parallel Processing, Guangzhou, Nov 15-17, 2018. Berlin, Heidelberg: Springer, 2018: 504-518.
[9] Leavline E J, Singh D. Hardware implementation of LZMA data compression algorithm[J]. International Journal of Applied Information Systems, 2013, 5(4): 51-56.
[10] Li B, Zhang L, Shang Z Z, et al. Implementation of LZMA compression algorithm on FPGA[J]. Electronics Letters, 2014, 50(21): 1522-1524.
[11] Fu H H, Liao J F, Yang J Z, et al. The Sunway TaihuLight supercomputer: system and applications[J]. Science China Information Sciences, 2016, 59(7): 072001.
[12] Pavlov I. LZMA SDK (software development kit)[EB/OL]. [2019-06-03]. http://www.7-zip.org/sdk.html.
[13] Ziv J, Lempel A. A universal algorithm for sequential data compression[J]. IEEE Transactions on Information Theory, 1977, 23(3): 337-343.
[14] Martin G. Range encoding: an algorithm for removing red-undancy from a digitised message[C]//Proceedings of the 1979 Conference on Video and Data Recording, Southampton, Jul 27, 1979. Hampshire: Institution of Electronic and Radio Engineers, 1979: 24-27.
[15] Deorowicz S. Universal lossless data compression algorithms[R]. Gliwice: Silesian University of Technology, 2003.
[16] Alakuijala J, Kliuchnikov E, Szabadka Z, et al. Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 compression algorithms[EB/OL]. (2015-09-22). http://www.gstatic.com/b/brotlidocs/brotli-2015-09-22.pdf.

编辑推荐 0

Metrics

阅读次数

全文

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	84

来源	本网站	其他网站

次数	81	3
比例	96%	4%

摘要

256

最新录用	在线预览	正式出版

0	0	256

	来源	本网站

	次数	256
	比例	100%