计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (9): 1093-1099.DOI: 10.3778/j.issn.1673-9418.1412064

• 高性能计算 • 上一篇    下一篇

BCC_AGCM_T106在Intel众核上混合异构编程与优化研究

方宝辉1+,徐金秀1,魏  敏2,3,周明忠1   

  1. 1. 江南计算技术研究所,江苏 无锡 214083
    2. 国家气象信息中心,北京 100081
    3. 清华大学 地球系统科学研究中心,北京 100084
  • 出版日期:2015-09-01 发布日期:2015-12-11

Research on Hybrid Programming and Optimization of BCC_AGCM_T106 on Intel Many Integrated Core

FANG Baohui1+, XU Jinxiu1, WEI Min2,3, ZHOU Mingzhong1   

  1. 1. Jiangnan Institute of Computing Technology, Wuxi, Jiangsu 214083, China
    2. National Meteorological Information Center, Beijing 100081, China
    3. Center for Earth System Science, Tsinghua University, Beijing 100084, China
  • Online:2015-09-01 Published:2015-12-11

摘要: 气象数值模式是天气预报和气候预测的基本工具和方法,随着技术的发展,模式分辨率有了大幅的提高,分辨率的提升使得计算量呈指数倍的增大,然而气候气象预报的时效性对并行程序的设计与计算平台的性能都提出了更高的要求。以气候模式T106为研究案例,以Intel  Xeon  PhiTM为实验平台,探索混合异构编程与优化的可行性,实现了CPU端MPI(message passing interface)+MIC(many integrated core)端OpenMP的混合异构编程,充分继承原始代码的MPI级并行,节约了开发成本。以两个CPU进程和一块MIC卡为例来测试性能数据,结果显示随着MIC卡上的线程数增多,气候模式T106核心段在MIC上加速明显,但相对于未使用MIC的纯MPI程序加速效果并不明显,这主要是由于T106核心段计算量不足而MIC卡与主机端数据交换较多造成的。

关键词: 气候模式T106, MIC架构, 混合异构编程, Offload模式

Abstract: Numerical weather model is a basic method and tool of weather forecasting and climate prediction. With the development of technology, the model resolution has improved greatly which brings out the exponentially-increasing computation cost. The time effectiveness of weather forecasting and climate prediction puts forward more advanced requests for the design of parallel program and the performance of computing platform. This paper takes BCC_AGCM_T106 for example to explore the feasibility of hybrid programming and optimization on the Intel  Xeon  PhiTM, implements the hybrid programming of MPI (message passing interface) parallel computation on CPU and OpenMP parallel computation on MIC (many integrated core). The experimental results show that the method can inherit the most of the original MPI parallel computing codes and reduce the developing costs significantly. This paper gets the test data using two CPU processes and an MIC card as an example. The results show that the acceleration performance of climate model T106 core section in MIC accelerates obviously with the number of threads on the MIC card increasing. But compared to the pure MPI program without using the MIC card, acceleration effect is not obvious, which is mainly due to that the calculation amount of T106 core section is insufficient and the data exchange between the MIC card and the host side is too much.

Key words: BCC_AGCM_T106, MIC architecture, hybrid programming, Offload pattern