Journal of Frontiers of Computer Science and Technology ›› 2013, Vol. 7 ›› Issue (5): 472-479.DOI: 10.3778/j.issn.1673-9418.1212012

Previous Articles    

GPU Based Parallel Method for Dynamic Collision Grid DSMC

WEN Minhua1+, LIN Xinhua1, Simon Chong Wee See1,2   

  1. 1. High Performance Computing Center, Shanghai Jiao Tong University, Shanghai 200240, China
    2. NVIDIA Corporation, Santa Clara, California 95051, USA
  • Online:2013-05-01 Published:2013-05-03

动态网格的DSMC方法在GPU上的并行

文敏华1+,林新华1,Simon Chong Wee See1,2   

  1. 1. 上海交通大学 高性能计算中心,上海 200240
    2. NVIDIA Corporation,美国 加利福尼亚 圣克拉拉 95051

Abstract: The direct simulation Monte Carlo (DSMC) method is a powerful computational tool in the field of rarefied gas dynamics. However, there are two main shortages in DSMC method: one is complex gridding processing, the other is large time consumption. The dynamic collision grid DSMC method generates collision grids adaptively according to the flowfield, which overcomes the first shortage. For the other shortage, using compute unified device architecture (CUDA) to write parallel program, the dynamic collision grid DSMC method is ported to graphic processing unit (GPU) to reduce computing time. During the parallel implementation, the main computation is performed on GPU while CPU only deals with the processes of initialization and output. A two-dimensional benchmark problem in different sizes is used to demonstrate the correctness of the parallelization. The results show that 10 times speedup is achieved based on NVIDIA Fermi C2050. For a same case, the performance on NVIDIA newly released Kepler K20 is 1.3~1.6 times higher than that on Fermi C2050.

Key words: compute unified device architecture (CUDA), graphic processing unit (GPU), direct simulation Monte Carlo (DSMC), dynamic collision grid DSMC, parallel simulation

摘要: 直接模拟蒙特卡罗方法(direct simulation Monte Carlo,DSMC)是稀薄气体动力学领域的重要工具。然而,DSMC方法有两个比较主要的缺点:一是复杂的网格处理;另一个是庞大的计算量。使用动态网格的DSMC方法可以根据流场信息,动态生成自适应的碰撞网格,能有效解决前一个缺点;针对后一个缺点,使用统一计算架构(compute unified device architecture,CUDA)编写并行程序,将基于动态网格的DSMC方法移植到图形处理器(graphic processing unit,GPU)上以减少计算时间。在并行实现中,GPU负责绝大部分的计算,而CPU只负责初始化、结果输出等少量工作。使用一个二维超音速横掠平板问题作为算例,验证了并行程序的正确性。对于不同规模的算例,在NVIDIA Fermi C2050之上均获得了10倍以上的加速比;对于相同算例,NVIDIA最新发布的Kepler K20上的速度约为Fermi C2050上的1.3~1.6倍。

关键词: 统一计算架构(CUDA), 图形处理器(GPU), 直接模拟蒙特卡罗方法(DSMC), 动态网格DSMC, 并行模拟