Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (7): 1583-1593.DOI: 10.3778/j.issn.1673-9418.2012080

• High Performance Computing • Previous Articles     Next Articles

Parallel Implementation of OpenVX Feature Extraction Functions in Programmable Processing Architecture

ZHANG Haocong1,+(), LI Tao1,2, XING Lidong1, PAN Fengrui1   

  1. 1. School of Electronic Engineering, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
    2. School of Computer Science & Technology, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
  • Received:2020-12-21 Revised:2021-02-25 Online:2022-07-01 Published:2021-03-23
  • Supported by:
    the Science and Technology Overall Planning Project of Shaanxi Province(2015KTCQ013);the Project of Collaborative Innovation Center of Shaanxi Provincial Department of Education(17JF032);the Scie.pngic Research Project of Shaanxi Provincial Department of Education(20JY058)

OpenVX特征抽取函数在可编程并行架构的实现

张好聪1,+(), 李涛1,2, 邢立冬1, 潘风蕊1   

  1. 1.西安邮电大学 电子工程学院,西安 710121
    2.西安邮电大学 计算机学院,西安 710121
  • 作者简介:张好聪(1996—),女,陕西渭南人,硕士研究生,主要研究方向为集成电路系统设计、数字图像处理。
    ZHANG Haocong, born in 1996, M.S. candidate. Her research interests include integrated circuit system design and digital image processing.
    李涛(1954—),男,北京人,博士,教授,CCF会员,主要研究方向为集成电路系统设计、人工神经网络、机器学习。
    LI Tao, born in 1954, Ph.D., professor, member of CCF. His research interests include integrated circuit system design, a.pngicial neural network and machine learning.
    邢立冬(1980—),男,山东潍坊人,博士,高级工程师,CCF会员,主要研究方向为集成电路系统设计、高速数字信号处理。
    XING Lidong, born in 1980, Ph.D., senior engineer, member of CCF. His research interests include integrated circuit system design and high speed digital signal processing.
    潘风蕊(1996—),女,陕西渭南人,硕士研究生,主要研究方向为集成电路系统设计、数字图像处理。
    PAN Fengrui, born in 1996, M.S. candidate. Her research interests include integrated circuit system design and digital image processing.
  • 基金资助:
    陕西省科技统筹项目(2015KTCQ013);陕西省教育厅协同创新中心项目(17JF032);陕西省教育厅科研计划项目(20JY058)

Abstract:

Aiming at the mass computing and slow speed of serial structure calculation of digital image processing, parallel implementation of underlying feature extraction kernel functions in the latest open source OpenVX specification 1.3 is completed, and the verification is carried out with the self-designed OpenVX programmable parallel processor. In the underlying feature extraction of the image, the basic pixel processing function Color Convert, the local image processing functions Gaussian Filter and Median Filter of OpenVX specification 1.3 are selected for filtering and smoothing. Harris Corners and Canny Edge Detector are selected for feature extraction. By dividing the complex nodes with large amount of computation into several simple nodes, different graph execution models are constructed and mapped on the OpenVX parallel processor to realize image edge detection and feature point extraction respectively. Verilog is used to design the hardware circuit, and the FPGA chip xcvu440-flga-2892-2-e of Xilinx has comprehensively verified that, compared with the serial mapping structure, the parallel acceleration ratio of the selected kernel function on the OpenVX programmable parallel processor can be up to 14.269. Experimental results show that the kernel functions in OpenVX specification 1.3, especially the complex kernel functions, can achieve expected acceleration effect in this parallel processing structure, and the speedup ratio of parallel and serial structures increases linearly.

Key words: OpenVX specification 1.3, computer vision function, low level feature extraction, graph execution model, parallel processor

摘要:

针对数字图像处理计算量大、串行结构计算速度慢等特点,完成了最新的开源OpenVX计算机视觉加速规范1.3中底层特征抽取核函数的并行实现,使用自主设计的OpenVX可编程并行处理器进行了验证。在对图像的底层特征提取中,前期滤波及平滑处理选择OpenVX 规范1.3中基本像素点处理函数Color Convert(颜色转换)和局部图像处理函数Gaussian Filter(高斯滤波)、Median Filter(中值滤波)等,核心的特征抽取操作选择Harris Corners(哈里斯角点检测)和Canny Edge Detector(坎尼边缘检测)核函数,通过将计算量大的复杂结点拆分为多个简单结点,构建不同的基于图的执行模型,并映射在OpenVX并行处理器上,分别实现图像的边缘检测和特征点抽取。使用Verilog语言设计整体硬件电路,经Xilinx公司的FPGA芯片xcvu440-flga-2892-2-e综合验证,与串行映射结构相比,所选核函数在OpenVX可编程并行处理器上的并行加速比最高可达14.269。实验结果表明,OpenVX规范1.3中的核函数尤其是复杂核函数能够在本并行处理结构上达到预期的加速效果,且并行与串行结构加速比呈线性增长。

关键词: OpenVX规范1.3, 计算机视觉函数, 底层特征抽取, 图执行模型, 并行处理器

CLC Number: