一种类数据流驱动的分片式流处理器体系结构及其编程模型  被引量:1

The Architecture and the Programming Model of a Data-Flow-Like Driven Tiled Stream Processor

在线阅读下载全文

作  者:徐光[1,2] 安虹[1,2] 许牧[1,2] 刘谷[1,2] 姚平[1,2] 任永青[1,2] 汪芳[1,2] 

机构地区:[1]中国科学技术大学计算机科学与技术学院,合肥230027 [2]中国科学院计算机系统结构重点实验室(中国科学院计算技术研究所),北京100190

出  处:《计算机研究与发展》2010年第9期1643-1653,共11页Journal of Computer Research and Development

基  金:国家自然科学基金重点项目(60633040);国家自然科学基金项目(60736012);国家"九七三"重点基础研究发展计划基金项目(2005CB321601);国家"八六三"重点基础研究发展计划重大项目(2006AA01A102);国家"八六三"高技术研究发展计划基金项目(2009AA01Z106);教育部-英特尔信息技术专项科研基金项目(MOE-INTEL-08-07)

摘  要:考虑到半导体工艺发展带来的线延迟问题,分布式、分片式的处理器结构变得很有吸引力.在传统流处理器中,流控制器发射的控制信号在传递时存在长线延迟问题.传统流处理器的运算簇由众多的功能部件组成,由于运算簇间的通信是集中控制的,运算簇间通信网络的线延迟可扩展性差.提出了一种分片式流处理器(TPA-PD)体系结构,它采用分布式的网络连接分片式的部件,避免了控制信号在传递过程中出现的长线延迟问题.在kernel级,TPA-PD使用类数据流的执行模型即显式数据流图执行,将指令间的依赖关系在指令中静态编码,把传统流处理器中运算簇间的集中通信变为动态发射、分布式的通信,利于结构扩展.解释了新的执行模型、指令集以及将流编程模型映射到新结构上.在时钟精确的模拟器上,实验分析了影响kernel级执行时间的软硬件因素,TPA-PD比传统流处理器在8个benchmark中平均获得了20%的加速比.In the view of wire delay increase brought by technology development, the distributed and tiled processor architecture becomes increasingly attractive. The controlling signal dispatched by the stream controller of the conventional stream processor faces the increasing wire delay. The cluster consists of a variety of functional units in the conventional stream processor. The wire delay scalability of the centralized communication architecture among clusters is improper. In this paper, a tiled architecture of the stream processor (TPA-PD) is introduced, in which the distributed network is used to connect the tiled components to address the increasing wire delay of the controlling signal. A data-flow-like driven execution model, which is explicit data graph execution, is employed in the kernel level, the dependence relation is encoded in the instruction set, and the centralized communication model of clusters is converted into dynamic dispatching and distributed communication model which is wire-delay scalable. The instruction set, and how to map the stream programming model to the TPAD-PD and microarchitecture are described. Finally, the authors analyze the factor which has an effect on the kernel level execution time on a cycle-accurate simulator, and the TPA-PD achieves an average 20% speedup over traditional stream processor in eight benchmarks.

关 键 词:线延迟 流处理器 分片式 类数据流驱动 处理器结构 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象