优化RSE开销的过程间栈寄存器分配  

Inter-Procedural Register Allocation for RSE Optimization

在线阅读下载全文

作  者:刘旸[1] 张兆庆[2] 

机构地区:[1]中国科学院计算技术研究所,北京100080 [2]中国科学院研究生院,北京100080

出  处:《计算机学报》2004年第9期1198-1206,共9页Chinese Journal of Computers

基  金:国家自然科学基金 (699330 2 0 );英特尔公司资助

摘  要:安腾 处理器引入了硬件控制的寄存器栈 ,寄存器栈引擎能够自动地改变寄存器栈帧指针 ,对栈寄存器进行保存和恢复 ,从而有效地减少跨越过程调用时的寄存器值的保存和重新载入 .每个过程使用的栈寄存器数量可以通过alloc指令显式地指定 .通常的过程内寄存器分配方法给过程分配最大需要数量的栈寄存器 .但过多的栈寄存器使用会引起寄存器栈溢出 /载入 .如果频繁出现这样的寄存器栈溢出 /载入 ,将严重影响程序执行性能 .该文提出了一种创新的算法 ,能够有效地减少RSE代价 .该算法已经在开放源码编译器ORC中得到了实现 .实验表明 ,SpecINT2 0 0 0在使用该算法后性能普遍提高 ,perlbmk的性能提高了 14 % ,而crafty也有 3 .2 %的性能提高 .In Itanium&reg architecture, a hardware managed register stack is introduced, register stack engine (RSE) can change the register stack frame pointers and spill/fill registers automatically. This mechanism can reduce load/store operations of register across call sites efficiently. The number of stacked registers used by a procedure could be specified by alloc instruction explicitly. Traditional intra-procedural register allocation algorithm will allocate max stacked registers required by a procedure but no more than the total number of stack registers. But a high stack register pressure will lead to frequent register stack spill/fill. If this event happens frequently, the performance will be seriously harmed. This paper proposes an innovative algorithm, which could reduce the RSE cost efficiently. This algorithm is already implemented in ORC. Experimental results show that the performance is improved obviously when this algorithm is applied, especially for perlbmk, it has 14% performance improvement and crafty also has 3.2% performance improvement.

关 键 词:寄存器栈 寄存器栈引擎 寄存器栈溢出/载入 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象