Parallelization and Optimization of Laser-Plasma-Interaction Simulation

doi:10.3778/j.issn.1673-9418.1611092

Journal of Frontiers of Computer Science and Technology ›› 2018, Vol. 12 ›› Issue (4): 550-558.DOI: 10.3778/j.issn.1673-9418.1611092

Previous Articles Next Articles

Parallelization and Optimization of Laser-Plasma-Interaction Simulation

WU Haipeng1, WEN Minhua1+, SEE Simon2, LIN James1,3

1. Center for High Performance Computing, Shanghai Jiao Tong University, Shanghai 200240, China
2. NVIDIA Technology Center, Singapore
3. Global Scientific Information and Computing Center, Tokyo Institute of Technology, Tokyo, Japan

Online:2018-04-01 Published:2018-04-04

激光等离子体相互作用模拟的并行和加速研究

武海鹏1，文敏华1+，SEE Simon2，林新华1,3

1. 上海交通大学高性能计算中心，上海 200240
2. NVIDIA Technology Center，新加坡
3. 东京工业大学学术国际情报中心，日本东京

Abstract

Abstract: The progress in generating intense ultra-short laser pulse demands more and more for kinetic descriptions of the interaction of such laser pulse with plasmas. Particle-in-cell (PIC) algorithm is a widely-used method in plasma physics to study the trajectories of charged particles under electromagnetic fields. Though there have been some implementations of PIC algorithm on GPU, some important issues still need to be clarified in detail, based on the characteristic of the laser-plasma-interaction simulation. This paper introduces a way to change the original CPU laser-plasma-interaction code into a parameterized adaptive GPU implementation with the whole algorithm ported. Then, this paper develops a series of methods to speed up the particle scatter phase: dynamic duplication algorithm, mix-precision computing and a parameterized particle sorting algorithm. Furthermore, this paper utilizes the GPUDirect RDMA (remote direct memory access) technique in a Kepler cluster and evaluates how it can benefit the MPI communication performance. The results from the numerical experiment show that these optimizations produce a 6.1x speed-up compared with the initial GPU version using the same number of GPUs for the key “Scatter” phase. The speed-up for the MPI communication part is 2.8x when the message size is over 3 KB. All the findings demonstrate that particular optimizations based on the features of the simulation and modern GPU cluster are essential for achieving significantly improved performance.

Key words: laser-plasma-interaction simulation, particle-in-cell (PIC), compute unified device architecture (CUDA), CUDA optimization, GPUDirect RDMA

摘要： 随着生成超短激光脉冲技术的不断发展，对这种激光脉冲和等离子体相互作用进行动力学描述也变得越来越重要。PIC（particle-in-cell）是一种在等离子体物理中，研究充能粒子在电磁场中运动轨迹的广泛采用的方法。尽管现在已经有一些在GPU上的PIC方法的实现，但是基于激光等离子体相互作用模拟的特点，仍然有很多重要问题可以尝试其他解决思路。提出了一种把初始的基于CPU的LPI模拟代码完整移植到GPU上的可行方法。提出了一系列加速初始的GPU版本的方法：动态冗余算法、混合精度算法、粒子排序算法。利用并且评估了GPUDirect RDMA（remote direct memory access）技术，其可以提高MPI的通信性能。实验结果证明，与初始的GPU版本相比，“Scatter”阶段加速比为6.1倍，当MPI传输数据大于3 KB时，通信过程提速了2.8倍。这些研究证明了针对模拟应用和GPU集群的特点进行特殊的优化能对性能带来显著的提升。

关键词: 激光等离子体相互作用, 粒子网格模拟, 统一计算设备架构（CUDA）, CUDA优化, GPUDirect RDMA

WU Haipeng, WEN Minhua, SEE Simon, LIN James. Parallelization and Optimization of Laser-Plasma-Interaction Simulation[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(4): 550-558.

武海鹏，文敏华，SEE Simon，林新华. 激光等离子体相互作用模拟的并行和加速研究[J]. 计算机科学与探索, 2018, 12(4): 550-558.

[1]	WEN Minhua, LIU Yongzhi, BAO Hua, HU Yue, SHEN Yongxing, WEI Jianwen, LIN Xinhua. Parallelization and Optimization of Application for Phonon BTE [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(8): 1288-1297.
[2]	ZHAO Wei, ZHAO Yonghua, LIU Xiaohui, HE Lixin. Accelerating First Principle Calculation Package on GPU Cluster [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(8): 897-905.
[3]	WEI Xiangyuan, YANG Huihua, XIE Pumo. Research and Implementation of Parallel Cuckoo Search Based on CUDA [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(6): 665-673.
[4]	QIN Zishan,　GU Fan, QIN Xiaoke, CHEN Mingsong. Efficient Dictionary-Based Compression/Decompression Techniques Using GPU [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(5): 525-536.
[5]	YE Weichen, CHEN Kefei. Fast Programming Algorithm to Find Non-Linear Feedback Shift Register [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(1): 28-39.
[6]	WEN Minhua, LIN Xinhua, Simon Chong Wee See. GPU Based Parallel Method for Dynamic Collision Grid DSMC [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(5): 472-479.

Parallelization and Optimization of Laser-Plasma-Interaction Simulation

激光等离子体相互作用模拟的并行和加速研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 6

Recommended Articles

Metrics