计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (11): 1838-1848.DOI: 10.3778/j.issn.1673-9418.1907054

• 高性能计算 • 上一篇    下一篇

神威国产处理器应用程序的并行参数自动寻优

刘徐,肖志勇,甘霖,徐敬蘅,陈宏博   

  1. 1. 江南大学 物联网工程学院,江苏 无锡 214122
    2. 国家超级计算无锡中心,江苏 无锡 214131
    3. 清华大学 计算机科学与技术系,北京 100084
  • 出版日期:2020-11-01 发布日期:2020-11-09

Automatic Optimization of Parallel Parameters for Sunway TaihuLight Super-computer Application Program

LIU Xu, XIAO Zhiyong, GAN Lin, XU Jingheng, CHEN Hongbo   

  1. 1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. National Supercomputing Center in Wuxi, Wuxi, Jiangsu 214131, China
    3. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Online:2020-11-01 Published:2020-11-09

摘要:

有限差分模板计算算法常应用于“神威·太湖之光”上完成大气模拟、石油勘探等任务,由于该算法通信开销大,计算密度高,且神威系统结构复杂,应用程序数据规模大,在程序构建和执行时难以得到合理的参数对数据进行分割,程序性能难以得到保证。针对申威26010处理器硬件特性提出一种基于遗传算法的并行参数自动寻优方法。对消息传递接口数据规模参数和从核数据规模参数进行自动寻优,对二维有限差分模板计算算法进行高性能测试。该方法在10亿次的寻址空间内寻取更优解,与编译系统自动分配相比达到了10.79倍加速比。此外,还对逆时偏移成像算法进行优化测试,与编译系统自动分配相比表现出6.31倍加速比。该方法对应用程序数据规模参数进行自动寻优,为国产异构众核处理器的高性能并行优化提供有用指导。

关键词: 并行计算, 参数自动寻优, 遗传算法, 申威异构众核处理器, 有限差分算法

Abstract:

The finite difference algorithm is often applied to Sunway TaihuLight to complete atmospheric simula-tion, oil exploration, and other tasks. However, due to the high communication cost and calculation density of the algorithm, the complex structure of Sunway system and large scale of application data, it is difficult to obtain reason-able parameters for data distribution during application construction and execution, and the performance of corres-ponding applications is difficult to be satisfactory. According to the hardware characteristics of Sunway 26010 processor, a parallel parameter automatic optimization method based on genetic algorithm is proposed. The data size parameter of message passing interface and the kernel are automatically optimized, and the two-dimensional finite difference algorithm is tested for high performance. The method finds the better solution in the 1 billion addressing space and achieves an acceleration ratio of 10.79 times compared with the automatic allocation of compiler system. In addition, compared with the automatic allocation of compiler system, this paper achieves 6.31 times acceleration for optimizing reverse time migration. This method realizes the automatic optimization of the application data scale parameters and provides useful guidance for the high-performance parallel optimization of domestic heterogeneous many-core processors.

Key words: parallel computing, parameter automatic optimization, genetic algorithm, Sunway heterogeneous multi-core processor, finite difference algorithm