计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (10): 1153-1162.DOI: 10.3778/j.issn.1673-9418.1412057

• 高性能计算 • 上一篇    下一篇

Intel 多核与集成众核上CFD程序的OpenMP 性能分析

车永刚1,2+,张理论1,2,王勇献1,2,徐传福2,程兴华2   

  1. 1. 国防科技大学 并行与分布处理重点实验室,长沙 410073
    2. 国防科技大学 计算机学院,长沙 410073
  • 出版日期:2015-10-01 发布日期:2015-09-29

OpenMP Performance Analysis of CFD Application on Intel Multicore and Manycore Architectures

CHE Yonggang1,2+, ZHANG Lilun1,2, WANG Yongxian1,2, XU Chuanfu2, CHENG Xinghua2   

  1. 1. Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, China
    2. College of Computer, National University of Defense Technology, Changsha 410073, China
  • Online:2015-10-01 Published:2015-09-29

摘要: 多核与众核已成为当前主流的高性能计算体系结构,OpenMP编程是开发其并行计算能力的主要手段之一。针对一个实际高阶精度结构网格CFD(computational fluids dynamics)应用程序,采用基于硬件计数器的性能测试和模型分析的方法,系统地研究了其在Intel Xeon E5 Sandy Bridge多核处理器和Intel Knights Corner集成众核协处理器上的OpenMP性能。重点分析了OpenMP库开销、线程负载均衡性、主存访问带宽对性能的影响,发现因OpenMP并行引入的冗余计算对并行效率影响很小,但串行计算部分和负载不均衡性对并行效率影响大,主存访问带宽对浮点性能的影响大。还比较了该程序两种体系结构上的性能差异,讨论了性能进一步优化的方向。

关键词: 多核, 集成众核, CFD应用程序, OpenMP, 性能分析

Abstract: Multicore and manycore are becoming mainstream architectures in high performance computing. OpenMP programming is one of the primary methods to exploit the parallel computing capabilities of them. By using a systematic approach which incorporates hardware performance counter based measurement and model based analysis, this paper evaluates the OpenMP performance of a real-world high order structured grids based CFD (computational fluids dynamics) application on Xeon E5 Sandy Bridge, an Intel multicore processor, and Knights Corner, an Intel many integrated core coprocessor. This paper analyzes the performance impacts of the OpenMP library cost, the load balance among different OpenMP threads, and the memory bandwidth to the application. The results show that the redundant computation introduced by OpenMP parallel programming is not significant. The serial portion and the load imbalance significantly affect the parallel efficiency. And memory access bandwidth significantly affects the achieved floating point performance. This paper also compares the performance differences between two architectures and discusses the directions of further performance tuning.

Key words: multicore, many integrated core, CFD application, OpenMP, performance analysis