面向OpenVX核心图像处理函数的并行架构设计被引量：2

Parallel Architecture Design for OpenVX Kernel Image Processing Functions

作　　者：潘风蕊李涛邢立冬张好聪吴冠中 PAN Fengrui;LI Tao;XING Lidong;ZHANG Haocong;WU Guanzhong(School of Electronic Engineering,Xi’an University of Posts&Telecommunications,Xi􀆳an 710121,China;School of Computer Science&Technology,Xi’an University of Posts&Telecommunications,Xi􀆳an 710121,China)

机构地区：[1]西安邮电大学电子工程学院,西安710121 [2]西安邮电大学计算机学院,西安710121

出　　处：《计算机科学与探索》2022年第7期1570-1582,共13页Journal of Frontiers of Computer Science and Technology

基　　金：陕西省科技统筹项目(2015KTCQ013);陕西省教育厅协同创新中心项目(17JF032);陕西省教育厅科研计划项目(20JY058)。

摘　　要：传统的可编程处理器虽然高度灵活,但其处理速度及性能不及专用集成电路(ASIC),而图像处理往往是多样、密集且重复的操作,因此处理器要兼顾速度、性能及灵活性。OpenVX是图像图形处理、图计算和深度学习等应用的预处理或者辅助处理开源标准,基于最新的OpenVX 1.3标准中的核心图像处理函数库,设计并实现了一种可编程、可扩展的专用指令集处理器(ASIP)——OpenVX并行处理器。首先分析对比了各种互联网络的拓扑特性,选择了性能比较突出的层次交叉互联网络(HCCM+)作为系统主干,在网络节点处设置处理单元(PE)构成支持动态配置的4×4 PE阵列,结合高效的路由通信方式设计了并行处理器,实现可编程的图像处理。其次所提出的架构适合数据并行计算和新兴的图计算,两种计算模式可单独或混合配置使用,分别将核心视觉函数及图计算模型映射到并行处理器上对两种模式进行验证,对比PE数目不同的情况下图像处理的速度。实验结果表明,并行处理器能够完成对基本核心函数和高复杂度的图计算模型的映射,在数据并行计算和流水线处理两种模式下,可以对图像处理线性加速,调用16个PE对各类函数的平均加速比可达15.0375。验证环境采用20 nmXCVU440平台芯片,综合实现后频率为125 MHz。Although the traditional programmable processors are highly flexible,their processing speed and perfor mance are inferior to the application specific integrated circuit(ASIC).Image processing is often a diverse,intensive and repetitive operation,so the processor must balance speed,performance and flexibility.OpenVX is an open source standard for preprocessing or auxiliary processing of image processing,graph computing and deep learning applications.Aiming at the kernel visual function library of OpenVX 1.3 standard,this paper designs and implements a programmable and extensible OpenVX parallel processor.The architecture adopts an application specific instruction processor(ASIP).After analyzing and comparing the topological characteristics of various interconnection networks,the backbone of the ASIP chooses the hierarchically cross-connected Mesh+(HCCM+)with outstanding performance,and processing element(PE)is set at network nodes.PE array is constructed to support dynamic configuration,and a parallel processor is designed to realize programmable image processing based on efficient routing and com munication.The proposed architecture is suitable for data parallel computing and emerging graph computing.The two computing modes can be configured separately or mixed.The kernel visual function and graph computing model are mapped to the parallel processor respectively to verify the two modes and compare the image processing speed under different PE numbers.The results show that OpenVX parallel processor can complete the mapping and linear speedup of kernel functions and high complexity graph calculation model.The average speedup of scheduling 16 PEs to various functions is approximately 15.0375.When implemented on an FPGA board with a 20 nm XCVU440 device,the prototype can run at a frequency of 125 MHz.

关键词：OpenVX核心图像处理函数专用指令集处理器(ASIP) 并行处理器层次交叉互联网络(HCCM+) 图计算模型

分类号：TP302[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向OpenVX核心图像处理函数的并行架构设计被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向OpenVX核心图像处理函数的并行架构设计 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向OpenVX核心图像处理函数的并行架构设计被引量：2