Design and Implementation of YOLOv2 Accelerator Based on Zynq7000 FPGA Heterogeneous Platform

doi:10.3778/j.issn.1673-9418.1903027

Abstract

Abstract: At present, convolutional neural network (CNN) has been widely used in image classification, object detection and other computer vision fields. However, in the forward inference stage, many practical applications often have low latency and strict power constraints. To solve this problem, an FPGA (field-programmable gate array) accelerator of CNN with the single instruction multiple data (SIMD) structure is designed and implemented using the optimization strategies such as parameter reordering and multi-channel data transmission. Taking YOLOv2 object detection algorithm as an example, the whole process of mapping CNN model to FPGA is described. The performance and resources of the accelerator are analyzed and modeled with the actual transmission delay being taken into account. It reduces the error between the theoretical and the actual delay of the accelerator. At the same time, the input and output modules in the accelerator are improved, which effectively improves the actual utilization of bus bandwidth. The experimental results show that a performance of 30.15 GOP/s is obtained on the Zedboard. Compared with the Xeon E5-2620 v4 CPU, 120.4 times of energy efficiency and 7.3 times of performance are obtained, and compared with the dual-core ARM-A9 CPU, 86 times of energy efficiency and 112.9 times of performance respectively are obtained.

Key words: hardware accelerator, field-programmable gate array (FPGA), convolutional neural network (CNN), high-level synthesis

摘要： 当前，卷积神经网络已在图像分类、目标检测等计算机视觉领域被广泛应用。然而，在前向推断阶段，许多实际应用往往具有低延时和严格的功耗限制。针对该问题，采用参数重排序、多通道数据传输等优化策略，设计并实现了一种基于FPGA的SIMD卷积神经网络加速器架构。以YOLOv2目标检测算法为例，介绍了将卷积神经网络模型映射到FPGA上的完整流程；对加速器的性能和资源耗费进行深入分析和建模，将实际传输延时考虑在内，缩小了加速器理论时延与实际时延的误差；改进了加速器架构中的输入和输出模块，有效提高了总线带宽的实际利用率。实验结果表明，在Zedboard上获得了30.15 GOP/s的性能，与Xeon E5-2620 v4 CPU相比，能效是其120.4倍，性能是其7.3倍；与双核ARM-A9 CPU相比，能效是其86倍，性能是其112.9倍。

关键词: 硬件加速器, 现场可编程门阵列（FPGA）, 卷积神经网络（CNN）, 高层次综合

CHEN Chen, CHAI Zhilei, XIA Jun. Design and Implementation of YOLOv2 Accelerator Based on Zynq7000 FPGA Heterogeneous Platform[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(10): 1677-1693.

陈辰，柴志雷，夏珺. 基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现[J]. 计算机科学与探索, 2019, 13(10): 1677-1693.

[1]	ZHANG Mengqian, ZHANG Li. Coarse-to-Fine Two-Stage Convolutional Neural Network Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1501-1510.
[2]	FANG Junting, TAN Xiaoyang. Defect Detection of Metal Surface Based on Attention Cascade R-CNN [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1245-1254.
[3]	NENG Wenpeng, LU Jun, ZHAO Caihong. Survey of Sleep Staging Based on Relational Induction Biases [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1026-1037.
[4]	MA Dan, WAN Liang, CHENG Qiqin, SUN Zhiqiang. Research on Application of Attention-CNN in Malware Detection [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 670-681.
[5]	ZHANG Li, QIU Cunyue, ZHANG Kaixin, ZHANG Dabo, LUO Hao. Optimized Layered Convolutional Sub-health Recognition Algorithm of Improved Capsule Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 712-722.
[6]	XIAO Zhenjiu, YANG Xiaodi, WEI Xian, TANG Xiaoliang. Improved Lightweight Network in Image Recognition [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 743-753.
[7]	TAN Yaya, KONG Guangqian. Review of Research on Video Quality Assessment Based on Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 423-437.
[8]	CHAI Enhui, MA Zhanfei, ZHI Min. Optimized Pedestrian Detection Algorithm for Norm-DP Model [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 545-552.
[9]	YUAN Ming, CHAI Zhilei, GAN Lin. FPGA-Based Hardware Accelerator Design and Implementation of Oil Palm Detection [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 315-326.
[10]	LIU Huilin, FENG Yue, XU Hong, LUO Jianyi. Survey of Tongue Segmentation in Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2276-2291.
[11]	LI Xingxiu, TANG Jianjun, HUA Jing. Arrhythmia Classification Based on CNN and Bidirectional LSTM [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2353-2361.
[12]	LI Wentao, PENG Li. Small Objects Detection Algorithm with Multi-scale Channel Attention Fusion Network [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(12): 2390-2400.
[13]	LIU Tengda, ZHU Junwen, ZHANG Yiwen. Review on FPGA-Based Accelerators in Deep Learning [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(11): 2093-2104.
[14]	LIU Ying, ZHANG Yixuan, SHE Jianchu, WANG Fuping, LIM Kengpang. Review of New Face Occlusion Inpainting Technology Research [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(10): 1773-1794.
[15]	MENG Xianfa, LIU Fang, LI Guang, HUANG Mengmeng. Review of Knowledge Distillation in Convolutional Neural Network Compression [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(10): 1812-1829.

Design and Implementation of YOLOv2 Accelerator Based on Zynq7000 FPGA Heterogeneous Platform

基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles 0

Metrics