计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (8): 1729-1748.DOI: 10.3778/j.issn.1673-9418.2210102
徐诚,郭进阳,李超,王靖,汪陶磊,赵杰茹
出版日期:
2023-08-01
发布日期:
2023-08-01
XU Cheng, GUO Jinyang, LI Chao, WANG Jing, WANG Taolei, ZHAO Jieru
Online:
2023-08-01
Published:
2023-08-01
摘要: 目前,现场可编程门阵列(field programmable gate array,FPGA)由于可编程性与出色的能效比受到了学术界与工业界的青睐,但是传统的基于硬件描述语言的FPGA开发方式面临编程挑战。硬件描述语言区别于通常使用的高级语言,阻碍了软件开发者对FPGA的利用。高层次综合(high-level synthesis,HLS)使得开发者可以从高级语言如C/C++层面直接进行FPGA硬件层面的开发,是解决这一问题的首选,受到了广泛的关注。近年来,学术界有许多关于HLS的工作,致力于解决HLS应用过程中的各类问题,并提升通过HLS开发的系统的性能。围绕使用HLS开发FPGA异构系统这一问题,以一种异构系统开发者的视角,列举了可行的优化方向。在编译优化层面,HLS工具可以通过插入编译指导与设计高效的空间探索算法,自动生成性能较高的RTL设计;在访存优化层面,HLS工具可以设立缓冲区,拆分并复制数据,以提升系统整体带宽;在并行优化层面,HLS工具可以实现语句级、任务级以及板卡级的并行。一些如DSL的技术虽然不能直接提升异构加速系统的性能,但是可以进一步提升HLS工具的可用性。最后,总结了当前HLS面临的一些挑战,并对HLS的未来研究方向进行了展望。
徐诚, 郭进阳, 李超, 王靖, 汪陶磊, 赵杰茹. 使用HLS开发FPGA异构加速系统:问题、优化方法和机遇[J]. 计算机科学与探索, 2023, 17(8): 1729-1748.
XU Cheng, GUO Jinyang, LI Chao, WANG Jing, WANG Taolei, ZHAO Jieru. Using HLS to Develop FPGA Heterogeneous Acceleration System: Problems, Optimization Methods and Opportunities[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(8): 1729-1748.
[1] Index CGC. Forecast and methodology 2018—2023 white paper[EB/OL]. (2018)[2023-01-12]. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet- report/white-paper-c11-741490. html. [2] 汤嘉武, 郑龙, 廖小飞, 等. 面向高性能图计算的高效高层次综合方法[J]. 计算机研究与发展, 2021, 58(3): 467-478. TANG J W, ZHENG L, LIAO X F, et al. Effective high-level synthesis for high-performance graph processing[J]. Journal of Computer Research and Development, 2021, 58(3): 467-478. [3] NIEMIEC G S, BATISTA L, SCHAEFFER-FILHO A, et al. A survey on FPGA support for the feasible execution of virtualized network functions[J]. IEEE Communications Surveys Tutorials, 2020, 22(1): 504-525. [4] THOMAS D, MOORBY P. The Verilog? hardware description language[M]. Berlin, Heidelberg: Springer, 2008. [5] SHAHDAD M. An overview of VHDL language and technology[C]//Proceedings of the 23rd ACM/IEEE Design Automation Conference, Las Vegas, Jun 1986. Washington: IEEE Computer Society, 1986: 320-326. [6] CHO S, PATEL M, CHEN H, et al. A full-system VM-HDL co-simulation framework for servers with PCIe-connected FPGAs[C]//Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 25-27, 2018. New York: ACM, 2018: 87-96. [7] NI N, PENG Y. Co-simulation framework of SystemC SoC virtual prototype and custom logic (abstract only)[C]//Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, Feb 11-13, 2013. New York: ACM, 2013: 278. [8] Xilinx. Vivado design suite user guide high-level synthesis v2016.1[EB/OL]. (2021-05-04) [2023-01-12]. https://docs.xilinx.com/v/u/en-US/ug902-vivado-high-level-synthesis. [9] Intel. Intel FPGA SDK for OpenCL Pro edition: programming guide[EB/OL]. (2019-04-22) [2023-01-12]. https://www.intel.com/content/www/us/en/docs/programmable/683846/19-1/introduction.html. [10] PUTNAM A, CAULFIELD A, CHUNG E, et al. A reconfi-gurable fabric for accelerating large-scale datacenter services[J]. IEEE Micro, 2015, 35(3): 10-22. [11] PUTNAM A. What to do with datacenter FPGAs besides deep learning[C]//Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, Feb 23-25, 2020. New York: ACM, 2020: 26. [12] DU Z W, HERKLOTZ Y, RAMANATHAN N, et al. Fuzzing high-level synthesis tools[C]//Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, Feb 28-Mar 2, 2021. New York: ACM, 2021: 148. [13] NANE R, SIMA V, PILATO C, et al. A survey and evaluation of FPGA high-level synthesis tools[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016, 35(10): 1591-1604. [14] 刘焰强, 戚正伟, 管海兵, FPGA 加速系统开发工具设计: 综述与实践[J]. 软件学报, 2020, 31(10): 3087-3099. LIU Y Q, QI Z W, GUAN H B. FPGA acceleration system development tools: survey and practice[J]. Journal of Software, 2020, 31(10): 3087-3099. [15] 郭进阳, 邵传明, 王靖, 等. FPGA图计算的编程与开发环境: 综述和探索[J]. 计算机研究与发展, 2020, 57(6): 1164-1178. GUO J Y, SHAO C M, WANG J, et al. Programming and developing environment for FPGA graph processing: survey and exploration[J]. Journal of Computer Research and Development, 2020, 57(6): 1164-1178. [16] CANIS A, CHOI J, ALDHAM M, et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems[C]//Proceedings of the 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 27-Mar 1, 2011. New York: ACM, 2011: 33-36. [17] KOEPLINGER D, FELDMAN M, PRABHAKAR R, et al. Spatial: a language and compiler for application accelerators[C]//Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, Philadelphia, Jun 18-22, 2018. New York: ACM, 2018: 296-331. [18] ZHAO J, FENG L, SINHA S, et al. COMBA: a comprehensive model-based analysis framework for high level synthesis of real applications[C]//Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, Irvine, Nov 13-16, 2017. Piscataway: IEEE, 2017: 430-437. [19] JO G, KIM H, LEE J, et al. SOFF: an OpenCL high-level synthesis framework for FPGAs[C]//Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, Valencia, May 30-Jun 3, 2020. Piscataway: IEEE, 2020: 295-308. [20] CHOI Y K, CHI Y, WANG J, et al. FLASH: fast, parallel, and accurate simulator for HLS[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(12): 4828-4841. [21] ZHENG S, LIANG Y, WANG S, et al. FlexTensor: an automatic schedule exploration and optimization framework for tensor computation on heterogeneous system[C]//Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Mar 16-20, 2020. New York: ACM, 2020: 859-873. [22] TARIQ O B, SHAN J, FLOROS G, et al. High-level annotation of routing congestion for Xilinx Vivado HLS designs[J]. IEEE Access, 2021, 9: 54286-54297. [23] XU P F, ZHANG X F, HAO C, et al. AutoDNNchip: an automated DNN chip predictor and builder for both FPGAs and ASICs[C]//Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, Feb 23-25, 2020. New York: ACM, 2020: 40-50. [24] MARGERM S, SHARIFIAN A, GUHA A, et al. TAPAS: generating parallel accelerators from parallel programs[C]//Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, Fukuoka, Oct 20-24, 2018. Washington: IEEE Computer Society, 2018: 245-257. [25] ZHANG Q, WANG J Y, XU G Q, et al. HeteroGen: transpiling C to heterogeneous HLS code with automated test generation and program repair[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Feb 28-Mar 4, 2022. New York: ACM, 2022: 1017-1029. [26] RASHID M I, SCH?FER B C. Improving the quality of hardware accelerators through automatic behavioral input language conversion in HLS[C]//Proceedings of the 27th Asia and South Pacific Design Automation Conference, Taipei, China, Jan 17-20, 2022. Piscataway: IEEE, 2022: 623-628. [27] SOHRABIZADEH A, YU C H, GAO H, et al. AutoDSE: enabling software programmers design efficient FPGA accelerators[J]. arXiv:2009.14381, 2020. [28] SUMEET N, DEEKSHA D, NAMBIAR M. HLS_Profiler: non-intrusive profiling tool for HLS based applications[C]//Proceedings of the 2022 ACM/SPEC International Conference on Performance Engineering, Beijing, Apr 9-13, 2022. New York: ACM, 2022: 187-198. [29] SUN Q, CHEN T, LIU S, et al. Correlated multi-objective multi-fidelity optimization for HLS directives design[J]. ACM Transactions on Design Automation of Electronic Systems, 2022, 27(4): 31. [30] GOSWAMI P, BHATIA D. Predicting post-route quality of results estimates for HLS designs using machine learning[C]//Proceedings of the 23rd International Symposium on Quality Electronic Design, Santa Clara, Apr 6-7, 2022. Piscataway: IEEE, 2022: 45-50. [31] MENG P, ALTHOFF A, GAUTIER Q, et al. Adaptive threshold non-pareto elimination: re-thinking machine learning for system level design space exploration on FPGAs[C]//Proceedings of the 2016 Design, Automation Test in Europe Conference Exhibition, Dresden, Mar 14-18, 2016. Piscataway: IEEE, 2016: 918-923. [32] KOEPLINGER D, PRABHAKAR R, ZHANG Y, et al. Automatic generation of efficient accelerators for reconfigurable hardware[C]//Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, Seoul, Jun 18-22, 2016. Washington: IEEE Computer Society, 2016: 115-127. [33] KUPPANNAGARI S R, RAJAT R, KANNAN R, et al. IP cores for graph kernels on FPGAs[C]//Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, Waltham, Sep 24-26, 2019. Piscataway: IEEE, 2019: 1-7. [34] Xilinx. ug998-vivado-intro-fpga-design-hls[EB/OL]. (2019-01-22)[2023-01-12]. https://www.xilinx.com/content/dam/xilinx/support/documents/sw_manuals/ug998-vivado-intro-fpga-design-hls.pdf. [35] Xilinx. UltraScale architecture memory resources user guide[EB/OL]. (2021-09-24)[2023-01-12]. https://docs.xilinx.com/v/u/en-US/ug573-ultrascale-memory-resources. [36] REICHE O, ?ZKAN M A, HANNIG F. et al. Loop parallelization techniques for FPGA accelerator synthesis[J]. Journal of Signal Processing Systems, 2018, 90(1): 3-27. [37] PENG L, WANG Y, PENG Z, et al. Memory partitioning and scheduling co-optimization in behavioral synthesis[C]//Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design, San Jose, Nov 5-8, 2012. Piscataway: IEEE, 2012: 488-495. [38] WANG Y, LI P, ZHANG P, et al. Memory partitioning for multidimensional arrays in high-level synthesis[C]//Proceedings of the 50th Annual Design Automation Conference, New York, May 29-Jun 7, 2013. New York: ACM, 2013: 1-8. [39] CHEN X Y, TAN H S, CHEN Y, et al. ThunderGP: HLS-based graph processing framework on FPGAs[C]//Proceedings of the 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Feb 28-Mar 2, 2021. New York: ACM, 2021: 69-80. [40] WINTERSTEIN F, FLEMING K, YANG H J, et al. MATCHUP: memory abstractions for heap manipulating programs[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 22-24, 2015. New York: ACM, 2015: 136-145. [41] FLEMING S T, THOMAS D B. Using runahead execution to hide memory latency in high level synthesis[C]//Proceedings of the 25th Annual International Symposium on Field-Programmable Custom Computing Machines, Napa, Apr 30-May 2, 2017. Piscataway: IEEE, 2017: 109-116. [42] CONG J, WEI P, YU C H, et al. Bandwidth optimization through on-chip memory restructuring for HLS[C]//Proceedings of the 54th Annual Design Automation Conference, Austin, Jun 18-22, 2017. New York: ACM, 2017: 1-6. [43] VOSS N, QUINTANA P, MENCER O, et al. Memory mapping for multi-die FPGAs[C]//Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, San Diego, Apr 28-May 1, 2019. Piscataway: IEEE, 2019: 78-86. [44] GUO L, CHI Y, WANG J, et al. AutoBridge: coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs[C]//Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb 28-Mar 2, 2021. New York: ACM, 2021: 81-92. [45] MILFORD M, MCALLISTER J. Constructive synthesis of memory-intensive accelerators for FPGA from nested loop kernels[J]. IEEE Transactions on Signal Processing, 2016, 64(16): 4152-4165. [46] WANG Y, LI P, CONG J. Theory and algorithm for generalized memory partitioning in high-level synthesis[C]//Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, Feb 26-28, 2014. New York: ACM, 2014: 199-208. [47] Xilinx. UG1120-Alveo data center acclerator card platforms user guide (v1.3)[EB/OL]. (2022-08-26) [2023-01-12]. https://docs.xilinx.com/r/en-US/ug1120-alveo-platforms. [48] JEDEC. High bandwidth memory (HBM) DRAM[EB/OL]. (2020) [2023-01-12]. https://www.jedec.org/standards-documents/docs/jesd235a. [49] CHOI Y, CHI Y, QIAO W, et al. HBM connect: high-performance HLS interconnect for FPGA HBM[C]//Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb 28-Mar 2, 2021. New York: ACM, 2021: 116-126. [50] RUAN Z Y, HE T, LI B J, et al. ST-Accel: a high-level programming platform for streaming applications on FPGA[C]//Proceedings of the 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines, Boulder, Apr 29-May 1, 2018. Washington: IEEE Computer Society, 2018: 9-16. [51] ZOU Y, LIN M J. Graph-Morphing: exploiting hidden parallelism of non-stencil computation in high-level synthesis[C]//Proceedings of the 56th Annual Design Automation Conference 2019, Las Vegas, Jun 2-6, 2019. New York: ACM, 2019: 124. [52] CHEN X Y, BAJAJ R, CHEN Y, et al. On-the-fly parallel data shuffling for graph processing on OpenCL-based FPGAs[C]//Proceedings of the 29th International Conference on Field Programmable Logic and Applications, Barcelona, Sep 8-12, 2019. Piscataway: IEEE, 2019: 67-73. [53] KAPRE N, PATEL H. Applying models of computation to OpenCL pipes for FPGA computing[C]//Proceedings of the 5th International Workshop on OpenCL, Toronto, May 16-18, 2017. New York: ACM, 2017: 1-4. [54] JIANG J C, WANG Z, LIU X, et al. Boyi: a systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs[C]// Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, Feb 3-25, 2020. New York: ACM, 2020: 299-309. [55] ZHANG C, WU D, SUN J, et al. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster[C]// Proceedings of the 2016 International Symposium on Low Power Electronics and Design, San Francisco, Aug 8-10, 2016. New York: ACM, 2016: 326-331. [56] SUN Y, AMANO H. FiC-RNN: a multi-FPGA acceleration framework for deep recurrent neural networks[J]. IEICE Transactions on Information & Systems, 2020, 103-D(12): 2457-2462. [57] Xilinx. Vivado HLS optimization methodology guide[EB/OL]. (2018-04-04) [2023-01-12]. https://docs.xilinx.com/v/u/en-US/ug1270-vivado-hls-opt-methodology-guide. [58] LI J, CHI Y, CONG J. HeteroHalide: from image processing DSL to efficient FPGA acceleration[C]//Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, Feb 23-25, 2020. New York: ACM, 2020: 51-57. [59] YE H, HAO C, CHENG J, et al. ScaleHLS: a new scalable high-level synthesis framework on multi-level intermediate representation[C]//Proceedings of the 2022 IEEE International Symposium on High-Performance Computer Architecture, Seoul, Apr 2-6, 2022. Piscataway: IEEE, 2022: 741-755. [60] CHI Y, GUO L, CHOI Y, et al. Extending high-level synthesis for task-parallel programs[J]. arXiv:2009.11389, 2020. [61] LI Z, LIU L, DENG Y, et al. Aggressive pipelining of irregular applications on reconfigurable hardware[C]//Proceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture, Jun 24-28, 2017. New York: ACM, 2017: 575-586. [62] XIANG S, LAI Y, ZHOU Y, et al. HeteroFlow: an accelerator programming model with decoupled data placement for software-defined FPGAs[C]//Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb 27-Mar 1, 2022. New York: ACM, 2022: 78-88. [63] LIU J, KAFI A, SHEN X, et al. MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA[C]//Proceedings of the 34th ACM International Conference on Supercomputing, Barcelona, Jun 2020. New York: ACM, 2020: 39. [64] Xilinx. Aurora 64B/66B v11. 2 LogiCORE IP product guide[EB/OL]. (2022-10-19) [2023-01-12]. https://docs.xilinx.com/r/en-US/pg074-aurora-64b66b. [65] LAI Y H, CHI Y, HU Y, et al. HeteroCL: a multi-paradigm programming infrastructure for software-defined reconfigurable computing[C]//Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, Feb 24-26, 2019. New York: ACM, 2019: 242-251. [66] ZHANG Y M, YANG M J, BAGHDADI R, et al. GraphIt: a high-performance graph DSL[J]. Proceedings of the ACM on Programming Languages, 2018, 1: 121. [67] MEMBARTH R, REICHE O, HANNIG F, et al. HIPAcc: a domain-specific language and compiler for image processing[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(1): 210-224. [68] EMOTO K, MATSUZAKI K, HU Z, et al. Think like a vertex, behave like a function! A functional DSL for vertex-centric big graph processing[C]//Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, Nara, Sep 18-22, 2016. New York: ACM, 2016: 200-213. [69] LEI?A R, BOESCHE K, HACK S, et al. Shallow embedding of DSLs via online partial evaluation[C]//Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, Pittsburgh, Oct 26-27, 2015. New York: ACM, 2015: 11-20. [70] SVENSSON B J, SHEERAN M, NEWTON R R. Design exploration through code-generating DSLs[J]. Communications of the ACM, 2014, 57(6): 56-63. [71] HASTJARJANTO T, JEURING J, LEATHER S. A DSL for describing the artificial intelligence in real-time video games[C]//Proceedings of the 3rd International Workshop on Games and Software Engineering: Engineering Computer Games to Enable Positive, Progressive Change, San Francisco, May 18, 2013. Washington: IEEE Computer Society, 2013: 8-14. [72] CHIW C, KINDLMANN G, REPPY J, et al. Diderot: a parallel DSL for image analysis and visualization[C]//Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, Beijing, Jun 11-16, 2012. New York: ACM, 2012: 111-120. [73] LOLONG S, KISTIJANTORO A I. Domain specific language (DSL) development for desktop-based database application generator[C]//Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, Ban-dung, Jul 17-19, 2011. Piscataway: IEEE, 2011: 1-6. [74] Xilinx. ChipScope Pro software and cores: user guide[EB/OL]. (2012-10-16) [2023-01-12]. https://www.xilinx.com/content/dam/ xilinx/support/documents/sw_manuals/xilinx14_7/chipscope_ pro_sw_cores_ug029.pdf. [75] Altera. Quartus II handbook version 13.1 volume 3: verification 13 design debugging using the SignalTap II logic analyzer[EB/OL]. (2014-06-30)[2023-01-12]. https://class.ece.uw.edu/469/peckol/doc/Tutorials/SignalTap-qii53009.pdf. [76] GOEDERS J, WILTON S J E. Effective FPGA debug for high-level synthesis generated circuits[C]//Proceedings of the 24th International Conference on Field Programmable Logic and Applications, Munich, Sep 2-4, 2014. Piscataway: IEEE, 2014: 1-8. [77] YANG L, GURUMANI S, CHEN D, et al. AutoSLIDE: automatic source-level instrumentation and debugging for HLS[C]//Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, Washington, May 1-3, 2016. Washington: IEEE Computer Society, 2016: 127-130. [78] MERLINI M A, POY I, CHOW P, et al. Interactive debugging at IP block interfaces in FPGAs[C]//Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb 28-Mar 2, 2021. New York: ACM, 2021: 138-144. [79] CONG J, LIU B, NEUENDORFFER S, et al. High-level synthesis for FPGAs: from prototyping to deployment[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30(4): 473-491. [80] FURKAN T, VERBAUWHEDE I. Trust in FPGA-accelerated cloud computing[J]. ACM Computing Surveys, 2021, 53(6): 128. [81] JORDAN M G, KOROL G, RUTZIG M B, et al. Resource-aware collaborative allocation for CPU-FPGA cloud environments[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2021, 68(5): 1655-1659. [82] PEREPELITSYN A, ZARIZENKO I, KULANOV V. FPGA as a service solutions development strategy[C]//Proceedings of the 11th IEEE International Conference on Dependable Systems, Services and Technologies, Kyiv, May 14-18, 2020. Piscataway: IEEE, 2020: 376-380. [83] TARAS I, ANDERSON J H. Impact of FPGA architecture on area and performance of CGRA Overlays[C]//Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, San Diego, Apr 28-May 1, 2019. Piscataway: IEEE, 2019: 87-95. |
[1] | 李佩琦,郁龚健,华夏,刘家航,柴志雷. PEST:由PYNQ集群实现的高能效NEST类脑仿真器[J]. 计算机科学与探索, 2021, 15(11): 2127-2141. |
[2] | 李炳剑,秦国轩,朱少杰,裴智慧. 面向卷积神经网络的FPGA加速器架构设计[J]. 计算机科学与探索, 2020, 14(3): 437-448. |
[3] | 陈辰,柴志雷,夏珺. 基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现[J]. 计算机科学与探索, 2019, 13(10): 1677-1693. |
[4] | 张为华+,臧斌宇. 共享主存多SIMD结构及编译技术研究[J]. 计算机科学与探索, 2009, 3(1): 18-25. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||