计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (7): 1570-1582.DOI: 10.3778/j.issn.1673-9418.2012085

• 高性能计算 • 上一篇    下一篇

面向OpenVX核心图像处理函数的并行架构设计

潘风蕊1,+(), 李涛1,2, 邢立冬1, 张好聪1, 吴冠中1   

  1. 1.西安邮电大学 电子工程学院,西安 710121
    2.西安邮电大学 计算机学院,西安 710121
  • 收稿日期:2020-12-22 修回日期:2021-02-25 出版日期:2022-07-01 发布日期:2021-03-23
  • 作者简介:潘风蕊(1996—),女,陕西渭南人,硕士研究生,主要研究方向为集成电路系统设计。
    PAN Fengrui, born in 1996, M.S. candidate. Her research interest is integrated circuit system design.
    李涛(1954—),男,北京人,博士,教授,CCF会员,主要研究方向为计算机体系结构、计算机图形学、大规模集成电路等。
    LI Tao, born in 1954, Ph.D., professor, member of CCF. His research interests include computer architecture, computer graphics, large-scale integrated circuit, etc.
    邢立冬(1980—),男,山东人,博士,高级工程师,CCF会员,主要研究方向为集成电路系统设计。
    XING Lidong, born in 1980, Ph.D., senior engineer, member of CCF. His research interest is integrated circuit system design.
    张好聪(1996—),女,陕西渭南人,硕士研究生,主要研究方向为集成电路系统设计。
    ZHANG Haocong, born in 1996, M.S. candidate. Her research interest is integrated circuit system design.
    吴冠中(1995—),男,陕西西安人,硕士研究生,主要研究方向为集成电路系统设计。
    WU Guanzhong, born in 1995, M.S. candidate. His research interest is integrated circuit system design.
  • 基金资助:
    陕西省科技统筹项目(2015KTCQ013);陕西省教育厅协同创新中心项目(17JF032);陕西省教育厅科研计划项目(20JY058)

Parallel Architecture Design for OpenVX Kernel Image Processing Functions

PAN Fengrui1,+(), LI Tao1,2, XING Lidong1, ZHANG Haocong1, WU Guanzhong1   

  1. 1. School of Electronic Engineering, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
    2. School of Computer Science & Technology, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
  • Received:2020-12-22 Revised:2021-02-25 Online:2022-07-01 Published:2021-03-23
  • Supported by:
    the Science and Technology Overall Planning Project of Shaanxi Province(2015KTCQ013);the Project of Collaborative Innovation Center of Shaanxi Provincial Department of Education(17JF032);the Scie.pngic Research Project of Shaanxi Provincial Department of Education(20JY058)

摘要:

传统的可编程处理器虽然高度灵活,但其处理速度及性能不及专用集成电路(ASIC),而图像处理往往是多样、密集且重复的操作,因此处理器要兼顾速度、性能及灵活性。OpenVX是图像图形处理、图计算和深度学习等应用的预处理或者辅助处理开源标准,基于最新的OpenVX 1.3标准中的核心图像处理函数库,设计并实现了一种可编程、可扩展的专用指令集处理器(ASIP)——OpenVX并行处理器。首先分析对比了各种互联网络的拓扑特性,选择了性能比较突出的层次交叉互联网络(HCCM+)作为系统主干,在网络节点处设置处理单元(PE)构成支持动态配置的4×4 PE阵列,结合高效的路由通信方式设计了并行处理器,实现可编程的图像处理。其次所提出的架构适合数据并行计算和新兴的图计算,两种计算模式可单独或混合配置使用,分别将核心视觉函数及图计算模型映射到并行处理器上对两种模式进行验证,对比PE数目不同的情况下图像处理的速度。实验结果表明,并行处理器能够完成对基本核心函数和高复杂度的图计算模型的映射,在数据并行计算和流水线处理两种模式下,可以对图像处理线性加速,调用16个PE对各类函数的平均加速比可达15.037 5。验证环境采用20 nm XCVU440平台芯片,综合实现后频率为125 MHz。

关键词: OpenVX核心图像处理函数, 专用指令集处理器(ASIP), 并行处理器, 层次交叉互联网络(HCCM+), 图计算模型

Abstract:

Although the traditional programmable processors are highly flexible, their processing speed and perfor-mance are inferior to the application specific integrated circuit (ASIC). Image processing is often a diverse, intensive and repetitive operation, so the processor must balance speed, performance and flexibility. OpenVX is an open source standard for preprocessing or auxiliary processing of image processing, graph computing and deep learning applications. Aiming at the kernel visual function library of OpenVX 1.3 standard, this paper designs and implements a programmable and extensible OpenVX parallel processor. The architecture adopts an application specific instruction processor (ASIP). After analyzing and comparing the topological characteristics of various interconnection networks, the backbone of the ASIP chooses the hierarchically cross-connected Mesh+ (HCCM+) with outstanding performance, and processing element (PE) is set at network nodes. PE array is constructed to support dynamic configuration, and a parallel processor is designed to realize programmable image processing based on efficient routing and com-munication. The proposed architecture is suitable for data parallel computing and emerging graph computing. The two computing modes can be configured separately or mixed. The kernel visual function and graph computing model are mapped to the parallel processor respectively to verify the two modes and compare the image processing speed under different PE numbers. The results show that OpenVX parallel processor can complete the mapping and linear speedup of kernel functions and high complexity graph calculation model. The average speedup of scheduling 16 PEs to various functions is approximately 15.0375. When implemented on an FPGA board with a 20 nm XCVU440 device, the prototype can run at a frequency of 125 MHz.

Key words: OpenVX kernel image processing functions, application specific instruction processor (ASIP), parallel processor, hierarchically cross-connected mesh+ (HCCM+), graph calculation model

中图分类号: