基于FPGA的油棕检测和硬件加速设计及实现

doi:10.3778/j.issn.1673-9418.1912029

摘要/Abstract

摘要：

针对深度学习在高分辨率遥感图像下棕榈树检测方面所面临的准确率不高和检测效率低下的问题，从算法优化和异构硬件平台加速两方面提出一种有效可靠的解决办法。以YOLOv3目标检测算法为例，采用扩大特征选择、加大多尺度特征融合的优化策略，提高了算法对高分辨率的棕榈树的检测准确度。在前向推理过程中，许多应用场景在要求模型高性能的同时往往会有严格的功耗限制。针对这个问题，采用权重整形8位量化和计算核心复用的优化策略，设计了一个基于SIMD的高效卷积计算引擎。此外，对输入模块进行了加速改进，通过对输入图片进行维度变化、向量化处理后，以写队列的方式传送给输入模块，提高了总线带宽的利用率。实验结果表明，经过算法优化后的模型准确率达到了97.84%，在基于Intel Arria10的异构硬件平台上可以获得1.4 TOPS性能，与i9-9980XE CPU相比，性能是它的7.51倍，能效是其33.02倍，与Nvidia推理端专用加速器P40比，能效是其1.2倍。

关键词: 现场可编程逻辑门阵列（FPGA）, 改进YOLOv3, 棕榈树, 硬件加速器

Abstract:

Aiming at the problems of low accuracy and low detection efficiency of high-resolution oil palm detection in deep learning, an effective and reliable solution is proposed from two aspects of algorithm optimization and heterogeneous hardware platform acceleration. Taking YOLOv3 object detection algorithm as an example, the optimization strategy of expanding feature selection and increasing multi-scale feature fusion is adopted to improve the detection accuracy of the algorithm for high-resolution oil palm. In addition, in the process of inference, plenty of applications require high performance models with strict power consumption limits. In order to solve this problem, taking the strategy of integer 8-bits quantitative weights and computational units reuse, this paper designs a high efficiency convolution computational engine based on SIMD. At the same time, through the strategy of the dimension change of the input image, vectorization, transmission to the input module in the form of written queue, this paper increases the efficiency of bus bandwidth greatly and accelerates the input module well. The experimental results show that the accuracy of the improved algorithm model is 97.84%, and a performance of 1.4 TOPS is obtained on the FPGA platform of Intel Arria 10. Compared with the i9-9980XE CPU, 7.51 times of the perform-ance and 33.02 times of energy efficiency are obtained. It is 1.2 times more efficient than Nvidia's dedicated P40 accelerator.

Key words: field-programmable gate array (FPGA), improved YOLOv3, oil palm, hardware accelerator

袁鸣, 柴志雷, 甘霖. 基于FPGA的油棕检测和硬件加速设计及实现[J]. 计算机科学与探索, 2021, 15(2): 315-326.

YUAN Ming, CHAI Zhilei, GAN Lin. FPGA-Based Hardware Accelerator Design and Implementation of Oil Palm Detection[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 315-326.

参考文献

[1] LI W J, FU H H, YU L, et al. Deep learning-based oil palm tree detection and counting for high-resolution remote sen-sing images[J]. Remote Sensing, 2016, 9(1): 22.
[2] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich fea-ture hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Washington: IEEE Computer Society, 2014: 580-587.
[3] GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Piscataway: IEEE, 2015: 1440-1448.
[4] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[J]. IEEE Transactions on Pattern Analysis and Mac-hine Intelligence, 2017, 39(6): 1137-1149.
[5] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Washington: IEEE Computer Society, 2017: 2980-2988.
[6] REDMON J, DIVVALA S K, GIRSHICK R B, et al. You only look once: unified, real-time object detection[C]//Pro-ceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Was-hington: IEEE Computer Society, 2016: 779-788.
[7] REDMON J, FARHADI A. Yolov3: an incremental impro-vement[J]. arXiv:1804.02767, 2018.
[8] REDMON J, FARHADI A. YOLO9000: better, faster, stron-ger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 6517-6525.
[9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the?14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Berlin, Heidelberg: Springer, 2016: 21-37.
[10] CHEN X Y, XIANG S M, LIU C L, et al. Vehicle detection in satellite images by parallel deep convolutional neural net-works[C]//Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, Naha, Nov 5-8, 2013. Piscataway: IEEE, 2013: 181-185.
[11] HAN X B, ZHONG Y F, ZHANG L P. An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery[J]. Remote Sensing, 2017, 9(7): 666.
[12] LONG Y, GONG Y P, XIAO Z F, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(5): 2486-2498.
[13] DENG Z P, LEI L, SUN H, et al. An enhanced deep convol-utional neural network for densely packed objects detection in remote sensing images[C]//Proceedings of the 2017 Inter-national Workshop on Remote Sensing with Intelligent Pro-cessing, Shanghai, May 18-21, 2017. Piscataway: IEEE, 2017: 1-4.
[14] TANG T Y, DENG Z P, ZHOU S L, et al. Fast vehicle detection in UAV images[C]//Proceedings of the 2017 Inter-national Workshop on Remote Sensing with Intelligent Processing, Shanghai, May 18-21, 2017. Piscataway: IEEE, 2017: 1-5.
[15] ZHANG C, LI P, SUN G Y, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks [C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, Feb 22-24, 2015. New York: ACM, 2015: 161-170.
[16] VENIERIS S I, BOUGANIS C. fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs[C]// Proceedings of the 24th Annual International Symposium on Field-Programmable Custom Computing Machines, Was-hington, May 1-3, 2016. Washington: IEEE Computer Society, 2016: 40-47.
[17] VENIERIS S I, BOUGANIS C. Latency-driven design for FPGA-based convolutional neural networks[C]//Proceedings of the 27th International Conference on Field Programmable Logic and Applications, ?Ghent, Sep 4-8, 2017. Piscataway: IEEE, 2017: 1-8.
[18] HAN S, LIU X Y, MAO H Z, et al. EIE: efficient inference engine on compressed deep neural network[C]//Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, Seoul, Jun 18-22, 2016. Washington: IEEE Computer Society, 2016: 243-254.
[19] LIN T Y, DOLLáR P, GIRSHICK R B, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 936-944.
[20] LI F F, ZHANG B, LIU B. Ternary weight networks[J]. arXiv:1605.04711, 2016.
[21] HUBARA I, COURBARIAUX M, SOUDRY D, et al. Bin-arized neural networks[C]//Proceedings of the?Annual Conf-erence on Neural Information Processing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 4107-4115.

编辑推荐 0

Metrics

阅读次数

全文

154

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	154

来源	本网站	其他网站

次数	144	10
比例	94%	6%

摘要

294

最新录用	在线预览	正式出版

0	0	294

	来源	本网站

	次数	294
	比例	100%