Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (6): 863-874.DOI: 10.3778/j.issn.1673-9418.1611079

Previous Articles     Next Articles

Implementation and Optimization of Multidimensional FFT Algorithm on Large-Scale Clusters

LI Kun1,2+, JIA Haipeng1, CAO Ting1, ZHANG Yunquan1   

  1. 1. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100190, China
  • Online:2017-06-01 Published:2017-06-07


李  琨1,2+,贾海鹏1,曹  婷1,张云泉1   

  1. 1. 中国科学院 计算技术研究所 计算机体系结构国家重点实验室,北京 100190
    2. 中国科学院大学 计算机与控制学院,北京 100190

Abstract: Fast Fourier transform (FFT) is a fast algorithm which computes the discrete Fourier transform (DFT) of a sequence, or its inverse. Fast Fourier transform is widely used for many applications in engineering, science and mathematics, such as signal decomposition, digital filter and image processing. As a result, the fine-grained optimization of FFT is extremely important in application. This paper studies the typical decomposition strategies of FFT, the parallel implementation of FFT algorithms on large-scale clusters and proposes some optimization strategies. On that basis, this paper also evaluates a variety of FFT algorithms performance on different platforms, then analyzes the implementation, strength and weakness, and the computing scalability on large-scale clusters of these algorithms. The experimental results show that the research of this paper can contribute to the further optimization on existing FFT algorithms, and instruct to choose an FFT algorithm which performance is better on large-scale heterogeneous systems (CPU and GPU) in order to satisfy the specified requirements.

Key words: cluster, fast Fourier transforms (FFT), message passing interface (MPI), performance optimization

摘要: 快速傅里叶变换(fast Fourier transform,FFT)是用于计算离散傅里叶变换(discrete Fourier transform,DFT)或其逆运算的快速算法,在工程、科学和数学领域的应用非常广泛,例如信号分解、数字滤波、图像处理等。因此,在实际应用中对FFT算法进行细粒度优化是非常重要的。研究了FFT算法常用的分解策略以及FFT算法在大规模集群系统上的并行实现,并提出了相关的优化策略。在此基础上,对多种FFT算法在不同平台上进行了性能评估,并分析了各算法的实现、优缺点及其在大规模计算时的可扩展性。实验结果表明,相关研究有助于对现有的FFT算法进行进一步的优化,以及指导如何在大规模CPU+GPU的异构系统上根据不同需求选择实现性能更优的FFT算法。

关键词: 集群, 快速傅里叶变换(FFT), 消息传递接口(MPI), 性能优化