计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (6): 665-673.DOI: 10.3778/j.issn.1673-9418.1403009

• 高性能计算 • 上一篇    下一篇

基于CUDA的并行布谷鸟搜索算法设计与实现

韦向远1+,杨辉华2,谢谱模3   

  1. 1. 桂林电子科技大学 电子工程与自动化学院,广西 桂林 541004
    2. 桂林电子科技大学 广西信息科学实验中心,广西 桂林 541004
    3. 桂林电子科技大学 计算机科学技术学院,广西 桂林 541004
  • 出版日期:2014-06-01 发布日期:2014-05-30

Research and Implementation of Parallel Cuckoo Search Based on CUDA

WEI Xiangyuan1+, YANG Huihua2, XIE Pumo3   

  1. 1. Department of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    2. Guangxi Experiment Center of Information Science, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    3. Department of Computer Science and Technology, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
  • Online:2014-06-01 Published:2014-05-30

摘要: 布谷鸟搜索(cuckoo search,CS)算法是近几年发展起来的智能元启发式算法,已经被成功应用于多种优化问题中。针对CS算法在求解大数据、大规模复杂问题时,计算时间过长的问题,提出了一种基于统一计算设备架构(compute unified device architecture,CUDA)的并行布谷鸟搜索算法。该算法的并行实现采用任务并行与数据并行相结合的方式,利用图形处理器(graphic processing unit,GPU)线程块与线程分别映射布谷鸟个体与个体的每一维数据,并行实现CS算法中的鸟巢位置更新、个体适应度评估、鸟巢重建、寻找最优个体操作。整个CS算法的寻优迭代过程完全通过GPU实现,降低了算法计算过程中CPU与GPU的通信开销。对4个经典基准测试函数进行了仿真实验,结果表明,相比标准CS算法,基于CUDA架构的并行CS算法在求解收敛性一致的前提下,在求解速度上获得了高达110倍的计算加速比。

关键词: 布谷鸟搜索算法, 并行计算, 图形处理器(GPU), 统一计算设备架构(CUDA)

Abstract: Cuckoo search (CS) is an intelligent meta-heuristic algorithm which has developed in recent years and has been successfully applied to a number of optimization problems. In order to solve the time consuming problem of CS when dealing with the large amounts of data and large-scale complex problems, this paper proposes a parallel version of Cuckoo Search algorithm based on CUDA (compute unified device architecture). The GPU (graphic processing unit) thread blocks and threads are mapped to each individual and the dimensions of individual, respectively. Task parallelism and data parallelism combination is adopted to implement nest position update, individual fitness evaluation, rebuilding nest and searching for best individual operations of CS algorithm. The whole optimization process can be parallel implementation by GPU to reduce the communication overhead between CPU and GPU. The experimental results on 4 classical benchmark functions show that the parallel CS algorithm obtains 110 times speedup with the similar quality compared to a standard sequential CS algorithm.

Key words: cuckoo search, parallel computing, graphic processing unit (GPU), compute unified device architecture (CUDA)