基于数据对象规模的Rank级内存分配方法  被引量:1

Data Object Scale Aware Rank-Level Memory Allocation

在线阅读下载全文

作  者:钟祺[1] 王晶[2] 管雪涛[1] 黄涛[1] 王克义[1] 

机构地区:[1]北京大学微处理器研究开发中心,北京100871 [2]首都师范大学高可靠嵌入式系统技术北京市工程技术研究中心,北京100048

出  处:《计算机研究与发展》2014年第3期672-680,共9页Journal of Computer Research and Development

基  金:"核高基"国家科技重大专项基金项目(2009ZX01029-001-002)

摘  要:利用主存的多bank/rank/channel结构挖掘访存并行性和局部性,是提高系统性能的重要手段.相关研究工作通过sub-rank技术增加可并行工作的存储资源,或在并行程序之间对bank划分,以隔离访存冲突.但上述方法没有考虑在bank/rank资源共存的情况下,单个程序内部数据对象间的冲突问题.通过观察数据在主存中的分布,发现程序的数据倾向聚簇于单个rank中,并提出了一种基于数据对象规模的rank级内存分配方法(data object scale aware rank-level memory allocation,DSRA).DSRA将冲突开销较大的数据对象分散到不同的rank,利用增长的bank/rank资源提高访存性能.DSRA工作在操作系统层,基于编译器和操作系统提供的信息来分析数据对象间的冲突开销,既不用修改源码,也不依赖特殊的底层硬件.基于2款真实处理器对来自NAS Benchmark和SPEC CPU2000中的存储敏感型基准测试程序进行评测.结果表明,在不影响cache失效率的情况下,DSRA通过减少主存访问周期数,可以降低程序的执行时间.与已有的优化技术相比,性能平均提高6.8%,最高性能提升幅度为16%.The main memory is organized as bank/rank/channel structure, which can be used to improve performance by exploiting parallelism and locality. The previous works have employed sub- ranking techniques to add more bank resource, or guided the bank partition among parallel running processes for isolating the memory interference. However, these methods ignore the interference problem when the memory system involves multiple ranks. In this paper, through an analysis on data layout, we find that program's data is inclined to cluster into a single rank because of the limited working set. This phenomenon results in the underutilized memory resource and system performance. We propose DSRA (data obiect scale aware rank-level memory allocation), which provides a software- only way to deal with this problem. Based on the cost of interference among objects, DSRA puts them into different ranks to avoid cluster. Meanwhile, with the information extracted by compiler and operating system, it requires no modification of application and underlying hardware. Measurement shows that DSRA, implementing in the Linux 2.6.32 kernel and running on two different types of processors, improves the performance of memory intensive NAS benchmark and SPEC CPU2000 by up to 16%(6.8% on average), with little effect on the cache miss rate.

关 键 词:访存冲突 操作系统 rank聚簇 内存分配 数据对象 

分 类 号:TP333.1[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象