LQCD Dslash在神威·太湖之光上的研究分析与MPI实现

doi:10.3778/j.issn.1673-9418.1811029

计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (10): 1664-1676.DOI: 10.3778/j.issn.1673-9418.1811029

LQCD Dslash在神威·太湖之光上的研究分析与MPI实现

张淼，周宇，陈建海，何钦铭，徐顺，宫明

1. 浙江大学计算机科学与技术学院，杭州 310012

2. 中国科学院计算机网络信息中心，北京 100190

3. 中国科学院高能物理研究所，北京 100049

出版日期:2019-10-01 发布日期:2019-10-15

Analysis and MPI Implementation of LQCD Dslash on Sunway TaihuLight

ZHANG Miao, ZHOU Yu, CHEN Jianhai, HE Qinming, XU Shun, GONG Ming

1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310012, China
2. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
3. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China

Online:2019-10-01 Published:2019-10-15

摘要/Abstract

摘要： “神威·太湖之光”是我国全自主研发的千万核超级计算机，目前已有很多大型应用程序在此先进架构上进行了移植优化。然而，高能物理领域的格点量子色动力学（LQCD）数值模拟软件在神威平台上尚未进行过移植优化，这引起了科学工作者们的关注。针对LQCD在神威平台上的移植优化问题展开研究。首先，论述了国内外对LQCD在不同硬件架构上进行并行优化的发展历程。其次，通过对其热点模块Dslash的重构，实现了在神威平台上的成功移植。再次，针对申威26010芯片异构众核的架构和并行模式，实现了从核阵列异构并行、从核本地设备存储器（LDM）与主存之间的直接存储访问（DMA）通讯、主核之间的消息传递接口（MPI）通讯及全局归约等操作。最后，经过实验测试，单核组优化程序与16核组优化程序相比单主核程序分别获得了165倍和25倍的加速比，并发现了一些重要的性能瓶颈问题，为进一步优化提升整体效率奠定重要基础。同时，对国产超算平台的推广使用具有积极意义。

关键词: 格点量子色动力学（LQCD）, Dslash, 消息传递接口（MPI）, 神威·, 太湖之光, 众核芯片

Abstract: Sunway TaihuLight is the supercomputer whose cores are more than ten million developed by China in its own independent way. Many large scale applications have been transplanted and optimized on it. However, the lattice quantum chromodynamics (LQCD) application of high energy physics has not been ported and optimized on the Sunway platform, which has attracted the attention of researchers. In this paper, the transplantation and optimization of LQCD on Sunway platform is studied. Firstly, the development at home and abroad of parallel optimization of LQCD in different hardware architectures is discussed. Secondly, through the reconstruction of its hot module—Dslash, it realizes the successful transplantation on Sunway platform. Thirdly, according to the architecture and parallel mode of the heterogeneous many-core SW26010 processor, the heterogeneous parallelism of the computing processing element (CPE) cluster, the direct memory access (DMA) communication between the CPE local device memory (LDM) and the main memory, the message passing interface (MPI) communication between the management processing elements (MPE), and the global reduction are realized. Finally, through the experiment, the optimized program of single core group (CG) version and the optimized program of 16 CGs version achieve 165 and 25 times speedups accordingly compared with single MPE version, and some important performance bottlenecks are found, which lays an important foundation for further optimization to improve the overall performance. At the same time, the work of this paper has positive significance for the popularization of the domestic supercom-puting platform.

Key words: lattice quantum chromodynamics (LQCD), Dslash, message passing interface (MPI), Sunway TaihuLight, many-core processor

张淼，周宇，陈建海，何钦铭，徐顺，宫明. LQCD Dslash在神威·太湖之光上的研究分析与MPI实现[J]. 计算机科学与探索, 2019, 13(10): 1664-1676.

ZHANG Miao, ZHOU Yu, CHEN Jianhai, HE Qinming, XU Shun, GONG Ming. Analysis and MPI Implementation of LQCD Dslash on Sunway TaihuLight[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(10): 1664-1676.

183

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	183

来源	本网站	其他网站

次数	169	14
比例	92%	8%

摘要

457

最新录用	在线预览	正式出版

0	0	457

	来源	本网站

	次数	457
	比例	100%

[1]	吕小敬，刘钊，蒋令闻，陈德训，杨广文. 船舶三维声弹性模拟软件的并行优化策略[J]. 计算机科学与探索, 2019, 13(11): 1852-1863.
[2]	李琨，贾海鹏，曹婷，张云泉. 大规模集群上多维FFT算法的实现与优化研究[J]. 计算机科学与探索, 2017, 11(6): 863-874.

LQCD Dslash在神威·太湖之光上的研究分析与MPI实现

Analysis and MPI Implementation of LQCD Dslash on Sunway TaihuLight

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐 0

Metrics