计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (1): 80-90.DOI: 10.3778/j.issn.1673-9418.1512078

• 高性能计算 • 上一篇    下一篇

天体物理成团研究中的非规则访存优化

郝  赫1,司雨蒙1,韦建文1,文敏华1,林新华1,2+   

  1. 1. 上海交通大学 高性能计算中心,上海 200240
    2. NVIDIA Technology Center Asia Pacific,Singapore 999002
  • 出版日期:2017-01-01 发布日期:2017-01-10

Optimizing Irregular Memory Access in Astrophysical Clustering Studies

HAO He1, SI Yumeng1, WEI Jianwen1, WEN Minhua1, LIN Xinhua1,2+   

  1. 1. Center for High Performance Computing, Shanghai Jiao Tong University, Shanghai 200240, China
    2. NVIDIA Technology Center Asia Pacific, Singapore 999002
  • Online:2017-01-01 Published:2017-01-10

摘要: HGGF(halo-based galaxy group finder)算法实现了基于暗物质晕的星系找群,在研究宇宙大尺度结构及宇宙的演化等领域中占有至关重要的地位。但由于数据规模的增长,急需对HGGF算法进行优化,以缩短运行时间。经分析,算法的热点部分耗时受到非规则访存的严重影响,因此针对算法的结构和非规则访存模型,提出了数据预排序方法,并分析了该方法如何影响访存过程。在此基础上,利用数据对齐、循环分解进一步优化访存效率,利用负载均衡和互斥变量私有化的方法提高了OpenMP的并行效率,最终将HGGF应用使用12线程加速11.6倍,同时取得了更好的可扩展性。主要有三点贡献:(1)分析了HGGF算法的非规则访存问题;(2)提出并分析了数据预排序方法;(3)使用数据对齐、循环分解、负载均衡、互斥变量私有化方法提高了HGGF应用的并行性能。

关键词: 天体物理成团, 非规则访存优化, 数据预排序, 并行计算

Abstract: Halo-based galaxy group finder (HGGF) tries to find galaxies in the same dark matter halo which is not directly visible. It plays a very important role in the research of large-scale structure of the universe. However, because of the growth of data scale, it’s extremely necessary to increase the running speed by optimizing the group finder coding algorithm. After a thorough investigation on the original HGGF code, it is found that the kernel part of the algorithm is seriously affected by the irregular memory access. This paper proposes a specific data pre-sorting approach and analyzes how it affects the process of memory access according to the structure of the algorithm and the irregular memory access pattern. Moreover, this paper uses data alignment and loop fission to optimize the memory access as well as improving the efficiency of OpenMP with load balance and mutex privatization. Eventually the HGGF application gets 11.6 times speedup on 12 threads, and gets better weak scalability. The following is the original contributions: (1) Analyze the irregular memory access of the HGGF application; (2) Propose and analyze the  data pre-sorting; (3) Improve the parallel performance of HGGF application with another four approaches including data alignment, loop fission, load balance and mutex privatization.

Key words: astrophysical clustering studies, optimizing irregular memory access, data pre-sorting, parallel computing