FPGA加速深度学习综述

doi:10.3778/j.issn.1673-9418.2104012

摘要/Abstract

摘要：

近年来，由于互联网的高速发展和大数据时代的来临，人工智能随之大热，而推动人工智能迅猛发展的正是深度学习的崛起。大数据时代需要迫切解决的问题是如何将极为复杂繁多的数据进行有效的分析使用，进而充分挖掘利用数据的价值并造福人类。深度学习作为一种实现机器学习的技术，正是解决这一问题的重要法宝，它在处理数据过程中发挥着重要作用并且改变了传统的机器学习方法，已被广泛应用于语音识别、图像识别和自然语言处理等研究领域。如何有效加速深度学习的计算能力一直是科研研究的重点。FPGA凭借其强大的并行计算能力和低功耗等优势成为GPU在加速深度学习领域的有力竞争者。从深度学习的几种典型模型出发，在FPGA加速技术现有特点的基础上从针对神经网络模型的加速器、针对具体问题的加速器、针对优化策略的加速器和针对硬件模板的加速器四方面概括总结了FPGA加速深度学习的研究现状，然后对比了不同加速技术和模型的性能，最后对未来可能发展的方向进行了展望。

关键词: 深度学习, 神经网络, 现场可编程逻辑门阵列（FPGA）, 硬件加速

Abstract:

For the past few years, with rapid development of Internet and big data, artificial intelligence has become popular, and it is the rise of deep learning that promotes the rapid development of AI. The problem that needs to be solved urgently in the era of big data is how to effectively analyze and use extremely complex and diverse data, and then make full use of the value of data and benefit mankind. As a technology of machine learning, deep learning which has been widely used in speech recognition, image recognition, natural language processing and many other fields is an important magic weapon to solve this problem. It plays an increasingly important role in data processing and changes traditional machine learning methods. How to effectively accelerate the computing power of deep learning has always been the focus of scientific research. With strong parallel computing power and low power consumption, FPGA has become a strong competitor of GPU in the field of deep learning acceleration. Starting from the typical models of deep learning, on the basis of the existing characteristics of FPGA acceleration technology, the research status of various accelerators is summarized from four aspects: accelerators for neural network models, accelerators for a specific application, accelerators for optimization strategies, and general accelerator frameworks with hardware templates. Then, the performance of different acceleration technologies in different models is compared. Finally, the possible development direction in the future is prospected.

Key words: deep learning, neural networks, field programmable gate array (FPGA), hardware accelerator

刘腾达，朱君文，张一闻. FPGA加速深度学习综述[J]. 计算机科学与探索, 2021, 15(11): 2093-2104.

LIU Tengda, ZHU Junwen, ZHANG Yiwen. Review on FPGA-Based Accelerators in Deep Learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(11): 2093-2104.

参考文献

[1] SZE V, CHEN Y H, EINER J, et al. Hardware for machine learning: challenges and opportunities[C]//Proceedings of the Custom Integrated Circruits Conference, Austin, Apr 30-May 3, 2017. Piscataway: IEEE, 2017: 1-8.
[2] DIEBOLD F. What??s the big idea?“Big Data”and its origins[J]. Significance, 2021, 18(1): 36-37.
[3] BAO P P, TAO C Q, HUANG Z Q. Research on data quality of open source code data[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 389-400.
包盼盼, 陶传奇, 黄志球. 面向开源源码大数据的数据质量研究[J]. 计算机科学与探索, 2020, 14(3): 389-400.
[4] YU Z, ZHOU H, JIANG L. Optimized allocation of FPGA memory for image processing[J]. Microprocessors and Micro-systems, 2021, 80(7): 103592.
[5] ALKHELAIWI M, BOULILA W, AHMAD J, et al. An efficient approach based on privacy-preserving deep learning for satellite image classification[J]. Remote Sensing, 2021, 13(11): 2221.
[6] XU D G, WANG L, LI F. Review of typical object detection algorithms for deep learning[J]. Computer Engineering and Applications, 2021, 57(8): 10-25.
许德刚, 王露, 李凡. 深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用, 2021, 57(8): 10-25.
[7] RABIE A. Detecting adversarial attacks on audio-visual speech recognition using deep learning method[J]. International Journal of Speech Technology, 2021. DOI:10.3390/electronics10111350.
[8] LI G J, LIANG S, NIE S, et al. Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition[J]. Neural Networks, 2021, 141: 225-237.
[9] KRUG A, EBRAHIMZADEH M, ALEMANN J, et al. Analyzing and visualizing deep neural networks for speech recognition with saliency-adjusted neuron activation profiles[J]. Electronics, 2021, 10(11): 1350.
[10] SHEN Y C, HSIA T C, HSU C H. Analysis of electronic health records based on deep learning with natural language proc-essing[J]. Arabian Journal for Science and Engineering, 2021. DOI:10.1007/s13369-021-05596-6.
[11] SHI Y, FENG D Z, CHENG Y, et al. A natural language-inspired multilabel video streaming source identification method based on deep neural networks[J]. Signal, Image and Video Processing, 2021, 15: 1161-1168.
[12] KOROTEEW M. BERT: a review of applications in natural language processing and understanding[J]. arXiv:2103.11943, 2021.
[13] YASMEEN F, SHERINE R. Optimizing MRI registration using software/hardware co-design model on FPGA[J]. International Journal of Innovative Technology and Exploring Engineering, 2020, 10(2): 128-137.
[14] WANG T, WANG C, ZHOU X, et al. An overview of FPGA based deep learning accelerators: challenges and opportuni-ties[C]//Proceedings of 2019 IEEE 21st International Con-ference on High Performance Computing and Communica-tions, Zhangjiajie, Aug 10-12, 2019. Piscataway: IEEE, 2019: 1674-1681.
[15] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[16] KUANG H, GUO Q, LI S Q, et al. Short-term wind power forecasting model based on multi-feature extraction and CNN-LSTM[J]. IOP Conference Series: Earth and Environmental Science, 2021, 702(1): 012019.
[17] ZHANG J Y, WANG H L, GUO Y, et al. Review of deep learning[J]. Application on Research of Computers, 2018, 35(7): 1921-1928.
张军阳, 王慧丽, 郭阳, 等. 深度学习相关研究综述[J]. 计算机应用研究, 2018, 35(7): 1921-1928.
[18] GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learn-ing[M]. Cambridge: MIT Press, 2016: 813-814.
[19] YANG P W, ZHOU Y H, XING G, et al. Applications of convolutional neural network in biomedical image[J]. Computer Engineering and Applications, 2021, 57(7): 44-58.
杨培伟, 周余红, 邢岗, 等. 卷积神经网络在生物医学图像上的应用进展[J]. 计算机工程与应用, 2021, 57(7): 44-58.
[20] CHEN C, YAN W, XIA J, et al. Design and implementation of FPGA-based deep learning object detection system[J]. Application of Electronic Technique, 2019, 45(8): 40-43.
陈辰, 严伟, 夏珺, 等. 基于FPGA的深度学习目标检测系统的设计与实现[J]. 电子技术应用, 2019, 45(8): 40-43.
[21] BOHN J, FEISCHL M. Recurrent neural networks as optimal mesh refinement strategies[J]. Computers and Mathematics with Applications, 2021, 97: 61-76.
[22] WU A C. Neural networks and deep learning[M]. Beijing: Electronic Industry Press, 2016: 348.
吴岸城. 神经网络与深度学习[M]. 北京：电子工业出版社, 2016: 348.
[23] AHMAD A, PASHA M A. Optimizing hardware accelerated general matrix-matrix multiplication for CNNs on FPGAs[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(11): 2692-2696.
[24] KALA S, NALESH S. Efficient CNN accelerator on FPGA[J]. IETE Journal of Research, 2020, 66(6): 733-740.
[25] MA Y F, CAO Y, VRUDHULA S, et al. Performance mode-ling for CNN inference accelerators on FPGA[J]. IEEE Trans-actions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4): 843-856.
[26] LI B J, QIN G X, ZHU S J, et al. Design of FPGA accelerator architecture for convolutional neural network[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 437-448.
李炳剑, 秦国轩, 朱少杰, 等. 面向卷积神经网络的FPGA加速器架构设计[J]. 计算机科学与探索, 2020, 14(3): 437-448.
[27] ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 22-24, 2015. New York: ACM, 2015: 161-170.
[28] ZHOU Y, JIANG J. An FPGA-based accelerator implementa-tion for deep convolutional neural networks[C]//Proceedings of the 2015 4th International Conference on Computer Science and Network Technology, Harbin, Dec 19-20, 2015. Piscataway: IEEE, 2015: 829-832.
[29] HUANG Z. Research and implementation of FPGA acce-leration for deep learning algorithm[D]. Chengdu: Univer-sity of Electronic Science and Technology, 2019.
黄圳. 深度学习算法的FPGA硬件加速研究与实现[D]. 成都：电子科技大学, 2019.
[30] SHEN Y, FERDMAN M, MILDER P. Maximizing CNN accelerator efficiency through resource partitioning[C]//Pro-ceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture, Toronto, Jun 24-28, 2017. Piscataway: IEEE, 2017: 535-547.
[31] SUN Y X, AMANO H. FiC-RNN: a multi-FPGA accele-ration framework for deep recurrent neural networks: special section on parallel, distributed, and reconfigurable computing, and networking[J]. IEICE Transactions on Information and Systems, 2020, 103(12): 2457-2462.
[32] GENG T, WANG T, SANAYULLAH A, et al. A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing[C]//Proceed-ings of the 2018 28th International Conference on Field Programmable Logic and Applications, Dublin, Aug 27-31, 2018. Piscataway: IEEE, 2018: 394-398.
[33] ZHANG J L, LI J. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network[C]//Proceedings of the 2017 ACM/SIGDA International Symposium, Monterey, Feb 22-24, 2017. New York: ACM, 2017: 25-34.
[34] NURVITADHI E, SHEFFIELD D, SIM J, et al. Accelera-ting binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC[C]//Proceedings of the 2016 International Conference on Field-Programmable Technology, Xi??an, Dec 7-9, 2016. Piscataway: IEEE, 2016: 77-84.
[35] RYBALKIN V, PAPPALARDO A, GHAFFAR M, et al. FINN-L: library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs[C]//Procee-dings of the 2018 28th International Conference on Field Programmable Logic and Applications, Dublin, Aug 27-31, 2018. Piscataway: IEEE, 2018: 89-96.
[36] QIU J T, WANG J, YAO S, et al. Going deeper with emb-edded FPGA platform for convolutional neural network[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 21-23, 2016. New York: ACM, 2016: 26-35.
[37] GUAN Y J, ZHI H Y, SUN G Y, et al. FPGA-based accelerator for long short-term memory recurrent neural networks[C]//Proceedings of the 2017 22nd Asia and South Pacific Design Automation Conference, Chiba, Jan 16-19, 2017. Piscataway: IEEE, 2017: 629-634.
[38] HAN S, KANG J L, MAO H Z, et al. ESE: efficient speech recognition engine with sparse LSTM on FPGA[C]//Proc-eedings of the 2017 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, Feb 22-24, 2017. New York: ACM, 2017: 75-84.
[39] NURVITADHI E, SIM J, SHEFFIELD D, et al. Accelera-ting recurrent neural networks in analytics servers: comparison of FPGA, CPU, GPU, and ASIC[C]//Proceedings of the 26th International Conference on Field Programmable Logic and Applications, Lausanne, Aug 29-Sep 2, 2016. Piscataway: IEEE, 2016: 1-4.
[40] QU W. Optimizing and accelerating application of deep learning in image recognition based on FPGA[D]. Chengdu: University of Electronic Science and Technology of China, 2019.
屈伟. 基于FPGA的深度学习在图像识别上的优化与加速应用[D]. 成都：电子科技大学, 2019.
[41] KHAN H, KHAN A, KHAN Z, et al. NPE: an FPGA-based overlay processor for natural language processing[J]. arXiv:2104.06535, 2021.
[42] MOTAMEDI M, GYSEL P, AKELLA V, et al. Design space exploration of FPGA-based deep convolutional neural net-works[C]//Proceedings of the 21st Asia and South Pacific Design Automation Conference, Macao, China, Jan 25-28, 2016. Piscataway: IEEE, 2016: 575-580.
[43] WU J F, ZHENG B W, NIE Y, et al. FPGA accelerator of 3DES algorithm based on OpenCL[J/OL]. Computer Engineering (2020-12-11)[2021-06-13]. https://doi.org/10.19678/j.issn.1000-3428.0059799.
吴健凤, 郑博文, 聂一, 等. 基于OpenCL的3DES算法FPGA加速器[J/OL]. 计算机工程(2020-12-11)[2021-06-13]. https://doi.org/10.19678/j.issn.1000-3428.0059799.
[44] LIAN R L. A framework for FPGA-based acceleration of neural network inference with limited numerical precision via high-level synthesis with streaming functionality[D]. Toronto: University of Toronto, 2016.
[45] ALWANI M, CHEN H, FERDMAN M, et al. Fused-layer CNN accelerators[C]//Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, China, Oct 15-19, 2016. Piscataway: IEEE, 2016: 12-21.
[46] GUAN Y J, LIANG H, XU N Y, et al. FP-DNN: an auto-mated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates[C]//Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, Napa, Apr 30-May 2, 2017. Piscataway: IEEE, 2017: 152-159.