计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (2): 181-194.DOI: 10.3778/j.issn.1673-9418.1712006

• 高性能计算 • 上一篇    下一篇

基于空间密铺的并行Stencil算法

郭  鹏1,2+,袁  良1,张云泉1,黄  珊1,2   

  1. 1. 中国科学院 计算技术研究所 计算机体系结构国家重点实验室,北京 100190
    2. 中国科学院大学 计算机与控制学院,北京 100049
  • 出版日期:2019-02-01 发布日期:2019-01-25

Parallel Stencil Algorithm Based on Tessellating

GUO Peng1,2+, YUAN Liang1, ZHANG Yunquan1, HUANG Shan1,2   

  1. 1. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2019-02-01 Published:2019-01-25

摘要: Stencil计算是一种科学和工程应用中常见的循环模式,而分块技术是一种提高数据局部性和并行性的强大转换方法。与以往直接对整个迭代空间进行分块的分块技术不同,提出了一种新的两层密铺分块的并行算法。首先,利用不同分块密铺数据空间;然后,所有分块沿时间维度扩展密铺迭代空间。该算法有以下优点:(1)最大化并发执行;(2)无冗余计算;(3)简洁的循环条件;(4)适应Stencil不同的尺寸、形状、阶数和边界条件。实验结果表明,对于3D27p Stencil,非周期边界的性能比Pluto高12%,周期边界的性能比Pochoir最高提升40%。

关键词: Stencil计算, 空间密铺, 分块方法

Abstract: Stencil computations represent a very common class of nested loops in scientific and engineering applications. The exhaustively studied tiling is one of the most powerful transformation techniques to explore the data locality and parallelism. Unlike previous work, which mostly blocks the iteration space of a Stencil directly, this paper proposes a novel two-level tessellation scheme. First, the data space is tiled with different blocks. Then, all blocks are tiled along the time dimension to iterate over the iteration space. The proposed scheme has the following advantages: (1) maximizing concurrent execution; (2) no redundancy computation; (3) simple loop conditions; (4) adapting to different sizes, shapes, orders, and boundary conditions of Stencil. The experimental results show that for 3D27p Stencil, the performance of the non-periodic boundary is 12% higher than that of Pluto, and the performance of the periodic boundary is up to 40% higher than that of Pochoir.

Key words: Stencil computation, tessellation, tiling