检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘风蕊 李涛 邢立冬 张好聪 吴冠中 PAN Fengrui;LI Tao;XING Lidong;ZHANG Haocong;WU Guanzhong(School of Electronic Engineering,Xi’an University of Posts&Telecommunications,Xian 710121,China;School of Computer Science&Technology,Xi’an University of Posts&Telecommunications,Xian 710121,China)
机构地区:[1]西安邮电大学电子工程学院,西安710121 [2]西安邮电大学计算机学院,西安710121
出 处:《计算机科学与探索》2022年第7期1570-1582,共13页Journal of Frontiers of Computer Science and Technology
基 金:陕西省科技统筹项目(2015KTCQ013);陕西省教育厅协同创新中心项目(17JF032);陕西省教育厅科研计划项目(20JY058)。
摘 要:传统的可编程处理器虽然高度灵活,但其处理速度及性能不及专用集成电路(ASIC),而图像处理往往是多样、密集且重复的操作,因此处理器要兼顾速度、性能及灵活性。OpenVX是图像图形处理、图计算和深度学习等应用的预处理或者辅助处理开源标准,基于最新的OpenVX 1.3标准中的核心图像处理函数库,设计并实现了一种可编程、可扩展的专用指令集处理器(ASIP)——OpenVX并行处理器。首先分析对比了各种互联网络的拓扑特性,选择了性能比较突出的层次交叉互联网络(HCCM+)作为系统主干,在网络节点处设置处理单元(PE)构成支持动态配置的4×4 PE阵列,结合高效的路由通信方式设计了并行处理器,实现可编程的图像处理。其次所提出的架构适合数据并行计算和新兴的图计算,两种计算模式可单独或混合配置使用,分别将核心视觉函数及图计算模型映射到并行处理器上对两种模式进行验证,对比PE数目不同的情况下图像处理的速度。实验结果表明,并行处理器能够完成对基本核心函数和高复杂度的图计算模型的映射,在数据并行计算和流水线处理两种模式下,可以对图像处理线性加速,调用16个PE对各类函数的平均加速比可达15.0375。验证环境采用20 nmXCVU440平台芯片,综合实现后频率为125 MHz。Although the traditional programmable processors are highly flexible,their processing speed and perfor mance are inferior to the application specific integrated circuit(ASIC).Image processing is often a diverse,intensive and repetitive operation,so the processor must balance speed,performance and flexibility.OpenVX is an open source standard for preprocessing or auxiliary processing of image processing,graph computing and deep learning applications.Aiming at the kernel visual function library of OpenVX 1.3 standard,this paper designs and implements a programmable and extensible OpenVX parallel processor.The architecture adopts an application specific instruction processor(ASIP).After analyzing and comparing the topological characteristics of various interconnection networks,the backbone of the ASIP chooses the hierarchically cross-connected Mesh+(HCCM+)with outstanding performance,and processing element(PE)is set at network nodes.PE array is constructed to support dynamic configuration,and a parallel processor is designed to realize programmable image processing based on efficient routing and com munication.The proposed architecture is suitable for data parallel computing and emerging graph computing.The two computing modes can be configured separately or mixed.The kernel visual function and graph computing model are mapped to the parallel processor respectively to verify the two modes and compare the image processing speed under different PE numbers.The results show that OpenVX parallel processor can complete the mapping and linear speedup of kernel functions and high complexity graph calculation model.The average speedup of scheduling 16 PEs to various functions is approximately 15.0375.When implemented on an FPGA board with a 20 nm XCVU440 device,the prototype can run at a frequency of 125 MHz.
关 键 词:OpenVX核心图像处理函数 专用指令集处理器(ASIP) 并行处理器 层次交叉互联网络(HCCM+) 图计算模型
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.209.242