面向卷积神经网络的FPGA加速器架构设计

doi:10.3778/j.issn.1673-9418.1906042

计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (3): 437-448.DOI: 10.3778/j.issn.1673-9418.1906042

面向卷积神经网络的FPGA加速器架构设计

李炳剑，秦国轩，朱少杰，裴智慧

天津大学微电子学院，天津 300072

出版日期:2020-03-01 发布日期:2020-03-13

Design of FPGA Accelerator Architecture for Convolutional Neural Network

LI Bingjian, QIN Guoxuan, ZHU Shaojie, PEI Zhihui

School of Microelectronics, Tianjin University, Tianjin 300072, China

Online:2020-03-01 Published:2020-03-13

摘要/Abstract

摘要：

随着人工智能的快速发展,卷积神经网络（CNN）在很多领域发挥着越来越重要的作用。分析研究了现有卷积神经网络模型，设计了一种基于现场可编程门阵列（FPGA）的卷积神经网络加速器。在卷积运算中四个维度方向实现了并行化计算；提出了参数化架构设计，在三种参数条件下，单个时钟周期分别能够完成512、1 024、2 048次乘累加；设计了片内双缓存结构，减少片外存储访问的同时实现了有效的数据复用；使用流水线实现了完整的神经网络单层运算过程，提升了运算效率。与CPU、GPU以及相关FPGA加速方案进行了对比实验，实验结果表明，所提出的设计的计算速度达到了560.2 GOP/s，为i7-6850K CPU的8.9倍。同时，其计算的性能功耗比达到了NVDIA GTX 1080Ti GPU的3.0倍，与相关研究相比，所设计的加速器在主流CNN网络的计算上实现了较高的性能功耗比，同时不乏通用性。

关键词: 硬件加速器, 现场可编程门阵列（FPGA）, 卷积神经网络（CNN）, 参数化架构, 流水线

Abstract:

With the rapid development of artificial intelligence, convolutional neural networks (CNN) play an increasingly important role in many fields. In this paper, the existing convolutional neural network model is analyzed, and a convolutional neural network accelerator based on field-programmable gate array (FPGA) is designed. In the convolution operation, the parallelization calculation is realized in four dimensions. A parametric architecture design is proposed. Under the three parameters, a single clock cycle can complete 512, 1024, 2048 multiply and accumulate respectively; the on-chip double buffer is designed. The structure reduces the off-chip storage access and realizes effective data multiplexing. The pipeline is used to implement a complete neural network single-layer operation process, which improves the operation efficiency. Compared with CPU, GPU and related FPGA acceleration schemes, the experimental results show that the speed of the design proposed by this paper is 560.2 GOP/s, which is 8.9 times that of the i7-6850K CPU. At the same time, calculated performance and power consumption ratio is 3.0 times that of NVDIA GTX 1080Ti GPU. Compared with related research, the accelerator designed achieves a high performance-to-power ratio in mainstream CNN network computing, and there is no lack of versatility.

Key words: hardware accelerator, field-programmable gate array (FPGA), convolutional neural network (CNN), parameterized architecture, pipeline

李炳剑，秦国轩，朱少杰，裴智慧. 面向卷积神经网络的FPGA加速器架构设计[J]. 计算机科学与探索, 2020, 14(3): 437-448.

LI Bingjian, QIN Guoxuan, ZHU Shaojie, PEI Zhihui. Design of FPGA Accelerator Architecture for Convolutional Neural Network[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 437-448.

[1]	任龙杰，孙颖，丁卫平，鞠恒荣，曹金鑫. 基于单种群蛙跳优化CNN的眼底图像多病变检测[J]. 计算机科学与探索, 2021, 15(9): 1762-1772.
[2]	张梦倩，张莉. 粗-细两阶段卷积神经网络算法[J]. 计算机科学与探索, 2021, 15(8): 1501-1510.
[3]	方钧婷，谭晓阳. 注意力级联网络的金属表面缺陷检测算法[J]. 计算机科学与探索, 2021, 15(7): 1245-1254.
[4]	能文鹏，陆军，赵彩虹. 基于关系归纳偏置的睡眠分期综述[J]. 计算机科学与探索, 2021, 15(6): 1026-1037.
[5]	赵小强，徐慧萍. 分级特征融合的图像语义分割[J]. 计算机科学与探索, 2021, 15(5): 949-957.
[6]	马丹，万良，程琪芩，孙志强. Attention-CNN在恶意代码检测中的应用研究[J]. 计算机科学与探索, 2021, 15(4): 670-681.
[7]	张利，邱存月，张凯鑫，张大波，罗浩. 改进胶囊网络优化分层卷积的亚健康识别算法[J]. 计算机科学与探索, 2021, 15(4): 712-722.
[8]	肖振久，杨晓迪，魏宪，唐晓亮. 改进的轻量型网络在图像识别上的应用[J]. 计算机科学与探索, 2021, 15(4): 743-753.
[9]	谭娅娅，孔广黔. 基于深度学习的视频质量评价研究综述[J]. 计算机科学与探索, 2021, 15(3): 423-437.
[10]	杨晨旭，蔡克参，张红云，苗夺谦. 基于人脸图像的二阶段性别分类算法[J]. 计算机科学与探索, 2021, 15(3): 524-532.
[11]	柴恩惠，马占飞，智敏. Norm-DP模型行人检测优化算法[J]. 计算机科学与探索, 2021, 15(3): 545-552.
[12]	袁鸣，柴志雷，甘霖. 基于FPGA的油棕检测和硬件加速设计及实现[J]. 计算机科学与探索, 2021, 15(2): 315-326.
[13]	刘慧琳, 冯跃, 徐红, 罗坚义. 深度学习的舌体分割研究综述[J]. 计算机科学与探索, 2021, 15(12): 2276-2291.
[14]	李兴秀, 唐建军, 华晶. 结合CNN与双向LSTM的心律失常分类[J]. 计算机科学与探索, 2021, 15(12): 2353-2361.
[15]	李文涛, 彭力. 多尺度通道注意力融合网络的小目标检测算法[J]. 计算机科学与探索, 2021, 15(12): 2390-2400.

面向卷积神经网络的FPGA加速器架构设计

Design of FPGA Accelerator Architecture for Convolutional Neural Network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics