基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现

doi:10.3778/j.issn.1673-9418.1903027

计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (10): 1677-1693.DOI: 10.3778/j.issn.1673-9418.1903027

基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现

陈辰，柴志雷，夏珺

1. 江南大学物联网工程学院，江苏无锡 214122

2. 数学工程与先进计算国家重点实验室，江苏无锡 214125

出版日期:2019-10-01 发布日期:2019-10-15

Design and Implementation of YOLOv2 Accelerator Based on Zynq7000 FPGA Heterogeneous Platform

CHEN Chen, CHAI Zhilei, XIA Jun

1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu 214125, China

Online:2019-10-01 Published:2019-10-15

摘要/Abstract

摘要： 当前，卷积神经网络已在图像分类、目标检测等计算机视觉领域被广泛应用。然而，在前向推断阶段，许多实际应用往往具有低延时和严格的功耗限制。针对该问题，采用参数重排序、多通道数据传输等优化策略，设计并实现了一种基于FPGA的SIMD卷积神经网络加速器架构。以YOLOv2目标检测算法为例，介绍了将卷积神经网络模型映射到FPGA上的完整流程；对加速器的性能和资源耗费进行深入分析和建模，将实际传输延时考虑在内，缩小了加速器理论时延与实际时延的误差；改进了加速器架构中的输入和输出模块，有效提高了总线带宽的实际利用率。实验结果表明，在Zedboard上获得了30.15 GOP/s的性能，与Xeon E5-2620 v4 CPU相比，能效是其120.4倍，性能是其7.3倍；与双核ARM-A9 CPU相比，能效是其86倍，性能是其112.9倍。

关键词: 硬件加速器, 现场可编程门阵列（FPGA）, 卷积神经网络（CNN）, 高层次综合

Abstract: At present, convolutional neural network (CNN) has been widely used in image classification, object detection and other computer vision fields. However, in the forward inference stage, many practical applications often have low latency and strict power constraints. To solve this problem, an FPGA (field-programmable gate array) accelerator of CNN with the single instruction multiple data (SIMD) structure is designed and implemented using the optimization strategies such as parameter reordering and multi-channel data transmission. Taking YOLOv2 object detection algorithm as an example, the whole process of mapping CNN model to FPGA is described. The performance and resources of the accelerator are analyzed and modeled with the actual transmission delay being taken into account. It reduces the error between the theoretical and the actual delay of the accelerator. At the same time, the input and output modules in the accelerator are improved, which effectively improves the actual utilization of bus bandwidth. The experimental results show that a performance of 30.15 GOP/s is obtained on the Zedboard. Compared with the Xeon E5-2620 v4 CPU, 120.4 times of energy efficiency and 7.3 times of performance are obtained, and compared with the dual-core ARM-A9 CPU, 86 times of energy efficiency and 112.9 times of performance respectively are obtained.

Key words: hardware accelerator, field-programmable gate array (FPGA), convolutional neural network (CNN), high-level synthesis

陈辰，柴志雷，夏珺. 基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现[J]. 计算机科学与探索, 2019, 13(10): 1677-1693.

CHEN Chen, CHAI Zhilei, XIA Jun. Design and Implementation of YOLOv2 Accelerator Based on Zynq7000 FPGA Heterogeneous Platform[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(10): 1677-1693.

[1]	任龙杰，孙颖，丁卫平，鞠恒荣，曹金鑫. 基于单种群蛙跳优化CNN的眼底图像多病变检测[J]. 计算机科学与探索, 2021, 15(9): 1762-1772.
[2]	张梦倩，张莉. 粗-细两阶段卷积神经网络算法[J]. 计算机科学与探索, 2021, 15(8): 1501-1510.
[3]	方钧婷，谭晓阳. 注意力级联网络的金属表面缺陷检测算法[J]. 计算机科学与探索, 2021, 15(7): 1245-1254.
[4]	能文鹏，陆军，赵彩虹. 基于关系归纳偏置的睡眠分期综述[J]. 计算机科学与探索, 2021, 15(6): 1026-1037.
[5]	赵小强，徐慧萍. 分级特征融合的图像语义分割[J]. 计算机科学与探索, 2021, 15(5): 949-957.
[6]	马丹，万良，程琪芩，孙志强. Attention-CNN在恶意代码检测中的应用研究[J]. 计算机科学与探索, 2021, 15(4): 670-681.
[7]	张利，邱存月，张凯鑫，张大波，罗浩. 改进胶囊网络优化分层卷积的亚健康识别算法[J]. 计算机科学与探索, 2021, 15(4): 712-722.
[8]	肖振久，杨晓迪，魏宪，唐晓亮. 改进的轻量型网络在图像识别上的应用[J]. 计算机科学与探索, 2021, 15(4): 743-753.
[9]	谭娅娅，孔广黔. 基于深度学习的视频质量评价研究综述[J]. 计算机科学与探索, 2021, 15(3): 423-437.
[10]	杨晨旭，蔡克参，张红云，苗夺谦. 基于人脸图像的二阶段性别分类算法[J]. 计算机科学与探索, 2021, 15(3): 524-532.
[11]	柴恩惠，马占飞，智敏. Norm-DP模型行人检测优化算法[J]. 计算机科学与探索, 2021, 15(3): 545-552.
[12]	袁鸣，柴志雷，甘霖. 基于FPGA的油棕检测和硬件加速设计及实现[J]. 计算机科学与探索, 2021, 15(2): 315-326.
[13]	刘慧琳, 冯跃, 徐红, 罗坚义. 深度学习的舌体分割研究综述[J]. 计算机科学与探索, 2021, 15(12): 2276-2291.
[14]	李兴秀, 唐建军, 华晶. 结合CNN与双向LSTM的心律失常分类[J]. 计算机科学与探索, 2021, 15(12): 2353-2361.
[15]	李文涛, 彭力. 多尺度通道注意力融合网络的小目标检测算法[J]. 计算机科学与探索, 2021, 15(12): 2390-2400.

基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现

Design and Implementation of YOLOv2 Accelerator Based on Zynq7000 FPGA Heterogeneous Platform

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics