一种缓存数据流信息的处理器前端设计被引量：1

A Dataflow Cache Processor Frontend Design

作　　者：刘炳涛[1,2] 王达[1] 叶笑春[1] 张浩[1] 范东睿[1] 张志敏[1]

机构地区：[1]中国科学院计算技术研究所,北京100190 [2]中国科学院大学,北京100049

出　　处：《计算机研究与发展》2016年第6期1221-1237,共17页Journal of Computer Research and Development

基　　金：国家"九七三"重点基础研究发展计划基金项目(2011CB302501);国家"八六三"高技术研究发展计划基金项目(2015AA011204;2012AA010901);"核高基"国家科技重大专项基金项目(2013ZX0102-8001-001-001);国家自然科学基金重点项目(61332009;61173007)~~

摘　　要：为了能够同时发掘程序的线程级并行性和指令级并行性,动态多核技术通过将数个小核重构为一个较强的虚拟核来适应程序多样的需求.通常这种虚拟核性能弱于占有等量芯片资源的原生核,一个重要的原因就是取指、译码和重命名等流水线的前端各阶段具有串行处理的特征较难经重构后协同工作.为解决此问题,提出了新的前端结构——数据流缓存,并给出与之配合的向量重命名机制.数据流缓存利用程序的数据流局部性,存储并重用指令基本块内的数据依赖等信息.处理器核利用数据流缓存能更好地发掘程序的指令级并行性并降低分支预测错误的惩罚,而动态多核技术中的虚拟核通过使用数据流缓存旁路传统的流水线前端各阶段,其前端难协同工作的问题得以解决.对SPEC CPU2006中程序的实验证明了数据流缓存能够以有限代价覆盖大部分程序超过90%的动态指令,然后分析了添加数据流缓存对流水线性能的影响.实验证明,在前端宽度为4条指令、指令窗口容量为512的配置下,采用数据流缓存的虚拟核性能平均提升9.4%,某些程序性能提升高达28%.In order to exploit both thread-level parallelism（TLP）and instruction-level parallelism（ILP）of programs,dynamic multi-core technique can reconfigure multiple small cores to a more powerful virtual core.Usually a virtual core is weaker than a native core with equivalent chip resource.One important reason is that the fetch,decode and rename frontend stages are hard to cooperate after reconfiguration because of their serialized processing nature.To solve this problem,we propose a new frontend design called the dataflow cache with a corresponding vector renaming（VR）mechanism.By caching and reusing the data dependencies and other information of the instruction basicblock,the dataflow cache exploits the dataflow locality of programs.Firstly,the processor core can exploit better instruction-level parallelism and lower branch misprediction penalty with dataflow cache;Secondly,the virtual core in dynamic multi-core can solve its frontend problem by using dataflow cache to bypass the traditional frontend stages.By experimenting on the SPEC CPU2006 programs,we prove that dataflow cache can cover 90% of the dynamic instructions with limited cost.Then,we analyze the performance effect of adding the dataflow cache to pipeline.At last,experiments show that with a frontend of 4-instruction wide and an instruction window of 512-entry,the performance of the virtual core with dataflow cache is improved up to 9.4%in average with a 28% maximum for some programs.

关键词：处理器微结构指令缓存数据流指令重命名数据流局部性

分类号：TP303[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种缓存数据流信息的处理器前端设计被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种缓存数据流信息的处理器前端设计 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种缓存数据流信息的处理器前端设计被引量：1