检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:林隽民[1] 陈彧[1] 李文龙 乔林[1] 汤志忠[1]
机构地区:[1]清华大学计算机科学与技术系,北京100084 [2]英特尔中国研究中心,北京100080
出 处:《清华大学学报(自然科学版)》2011年第8期1055-1062,1071,共9页Journal of Tsinghua University(Science and Technology)
基 金:国家自然科学基金资助项目(60573100;60773149);国家"八六三"高技术项目(2008AA01Z108);国家"九七三"重点基础研究项目(2007CB310900)
摘 要:工作负载分析是片上多处理器末级缓存设计的关键先导工作。分析了一组访存密集型多线程RMS(recognition-mining-synthesis)工作负载工作集大小、数据共享行为和空间局部性等访存行为,研究了末级缓存的设计空间,探讨了未来片上多处理器的缓存体系结构设计。实验结果表明:大容量DRAM缓存有助于满足这组负载的大工作集对缓存容量的需求,使用128MB DRAM缓存比不使用时平均可以减少18%的L1缓存缺失延迟;共享缓存设计比私有设计性能更好,8MB的共享缓存可以比相同总容量的私有缓存提高25%的缓存性能;基于步长的硬件数据预取机制可以提高25%的性能。因此,对于访存密集型RMS负载,宜采用一个128MB的DRAM缓存、一个8MB片上SRAM缓存,结合一个8表项的流式预取器,构成缓存子系统。Workload characterization is a key leading job for the design of last-level caches (LLCs) on multi core processors. This paper analyzes the memory behavior of emerging RMS (recognition, mining, and synthesis) workloads for future multl-core processors, including the working set sizes, data sharing behavior, and spatial data locality, which shows that these RMS workloads are memory intensive, with large working set sizes, a significant amount of data sharing, and strong strided access patterns. The LLC design space was then explored for multi-threaded RMS workloads and the potential architectural choices were discussed for future multi-core cache design based on the observations. The experimental results show that large DRAM caches can effectively satisfy the cache requirement caused by large working sets with a 128 MB DRAM cache significantly reducing the average L1 miss penalty by 18% ; that the shared cache provides better performance than the private cache at the LLC level with a 8 MB shared cache improving the cache performance by 25% compared with a private cache with the same size in total; and that stride based hardware prefetehing mechanism provides significant performance improvement by 25 %. Consequently, a memory hierarchy is given with a 128 MB DRAM cache, an 8 MB on die SRAM shared cache, and an 8-entry stride prefetcher for the RMS workloads.
关 键 词:片上多处理器 片上缓存 负载分析 访存性能 RMS负载
分 类 号:TP393.03[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.143