检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机学报》2011年第1期182-192,共11页Chinese Journal of Computers
基 金:国家自然科学基金(60236030);清华大学基础研究基金;国家博士点/博士后项目基金(20050003083)资助
摘 要:应用的需求促使如今的处理器必须尽可能高地利用程序中所存在的指令级并行度,然而,高指令级并行的硬件和指令调度技术会给寄存器资源带来极大的压力.要在单一寄存器堆的情况下,既维持高的指令级并行度,又保持高的运行时钟频率是一件非常困难的事情,这是因为,当指令级并行度足够高时,在单一寄存器堆情况下,寄存器堆访问端口数目的限制会使得对寄存器堆的访问时间成为制约性能提高的关键因素.为了在利用高的指令级并行度的同时维持高的运行时钟频率,可以将寄存器堆和功能单元划分到不同的簇中.每一个簇中的功能单元可以直接访问簇内的寄存器堆,而簇间的数据交互则需要占用专用的资源来进行.因此,分簇结构下的编译器不仅要通过调度实现最大程序的指令级并行度,还应该对指令的分簇进行细致的安排以限制簇间的数据交互.该文致力于通过对数据依赖图(Data Dependence Graph)分析和划分,从而在最小化簇间的数据交互的同时,平衡各簇的利用情况,提高能够获得的指令级并行度,从而优化分簇式VLIW结构的调度性能.实验结果证明,该文所提出的方法可以极大地减少簇间的数据交互量,提高所能获得的指令级并行度,从而对调度结果的性能加以改善.Applications, especially the multimedia processing applications, have put an increas- ingly performance requirement on today's computer architectures. Many today's computer archi- tectures typically are taking increasing advantage of the instruction level parallelism (ILP) availa- ble in programs. Unfortunately, large amounts of ILP hardware and aggressive instruction sched- uling techniques put large demands on a machine's register resources. With large amounts of ILP, it becomes difficult to maintain a single monolithic register file and a high clock rate. The access time for such a register file would become the bottleneck of performance improvement. To provide support for large amounts of ILP while retaining a high clock rate, registers and function al units can be partitioned into separate clusters. Register in one cluster is directly accessible by only the functional units in that cluster with specific resources required for communicating data between clusters. Therefore, a compiler must deal not only with achieving maximal parallelism via aggressive scheduling, but also with meticulous instruction placement to limit inter-cluster da- ta communications. This work is focused on optimization of instruction scheduling technology for clustered VLIW architectures with the help of Data Dependence Graph (DDG). It reduces the amount of inter-cluster data communications and balances the use of all clusters simultaneously, and enhances the achieved instruction level parallelism. Experiments result show that the propo- sing scheduling technology can largely reduce the amount of data communications among clusters, which in turn brings an improvement in achieved ILP and performance.
关 键 词:分簇 VLIW结构 数据依赖图 指令调度 簇间数据交互
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249