片上多核处理器Cache访问均衡性研究被引量：3

Research on Cache Access Equalization in Chip Multi-Processor

作　　者：王子聪陈小文[1] 郭阳[1] WANG Zi-Cong;CHEN Xiao-Wen;GUO Yang(College of Computer,National University of Defense Technology,Changsha 410073)

机构地区：[1]国防科技大学计算机学院

出　　处：《计算机学报》2019年第11期2403-2416,共14页Chinese Journal of Computers

基　　金：国家自然科学基金(61502508,61572025);湖南省自然科学基金(2015JJ3017)资助~~

摘　　要：随着片上多核处理器(CMP)规模的不断扩大和处理核数的增多,系统对于片上缓存(Cache)在容量和速度方面有了更高的需求.为了能够有效利用Cache资源,非一致Cache体系结构(NUCA)被提出用于支持高容量低延迟的Cache组织结构.另一方面,片上网络(NoC)由于具备良好的可扩展性,在片上多核处理器的互连方式上具有显著优势.因此,基于片上网络的非一致Cache体系结构逐渐成为未来组织大容量Cache的主流系统架构.在这样的系统架构中,最后一级缓存(LLC)通常在物理上分布于每个处理节点,这些Cache存储体(Bank)在逻辑上共同构成一个统一的共享Cache.当处理核发出Cache访问请求时,其访问时间与请求处理核节点与访问数据所在的Bank节点的距离有关.当距离较近时,访问时间较短;当访问距离较远的Bank时,访问时间较长.因此,当系统规模逐渐增大时,这种访问延迟与网络距离相关的特性会使得不同节点之间的通信距离和通信延迟的差异性逐渐增大.另外,片上网络规模的增大也会使得Cache访问延迟逐渐由网络延迟主导.这种延迟差异性会引起网络报文延迟不均衡问题,导致Cache访问延迟的非一致性进一步增大,因而出现更多的大延迟Cache访问并成为制约系统性能的瓶颈.因此,研究片上多核处理器的Cache访问均衡性对于提升网络性能和系统性能具有积极意义.该文分析了造成Cache访问延迟不均衡的原因,并针对延迟的两个来源:无冲突延迟和竞争延迟,分别提出了非一致存储映射和非一致链路分布的设计方法.通过非一致存储映射,我们根据Cache存储体在网络中的物理位置调节其相应的Cache块映射比例,从而均衡Cache请求平均访问距离;通过合理设计非一致的链路分布,我们依据各条链路上的流量负载为其分配合适的通道数量,从而缓解流量压力较大的链路上的报文竞争.全系统模拟器上的实验�Along with the scaling up for the size of chip multi-processor(CMP)and the increase in the number of cores,the system has a higher demand for on-chip cache in terms of capacity and speed.In order to effectively utilize cache resources,non-uniform cache architecture(NUCA)is proposed to support cache organization with high-capacity and low-latency.On the other hand,networks-on-chip(NoC)has significant advantages in terms of the interconnection of CMP due to its good scalability.Therefore,NoC-based NUCA is gradually becoming the major architecture to organize large cache.In such system architecture,last level cache(LLC)is distributed on every network node,and all the cache banks logically constitute a unified shared cache.When a core issues a cache access request,the access time is determined by the network distance between the core and the requested cache bank.When the cache bank is near the core,the access time is short;when accessing a cache bank with a long distance,the access time is longer.Thus,when the scale of system is gradually increased,the communication distance and latency gap between different cores is also increased due to the feature of the access latency associated with the network distance.In addition,the increase in the size of NoC will also make the cache access latency gradually dominated by the network latency.Such latency gap can cause the network latency imbalance problem,aggravate the degree of non-uniform for cache access latencies,and lead to more cache accesses with overhigh latencies which become the bottleneck of system.Hence,the research on cache access equalization in CMP has a positive meaning for the promotion of network and system performance.This paper analyzes the reasons for the cache access imbalance,and proposes design methods including non-uniform memory mapping scheme and non-uniform link distribution which aim to balance the non-contention and contention latencies respectively.In non-uniform memory mapping scheme,we adjust the proportion of cache block mapped from memory to

关键词：片上多核处理器非一致缓存体系结构片上网络均衡性缓存访问

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

片上多核处理器Cache访问均衡性研究被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

片上多核处理器Cache访问均衡性研究 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

片上多核处理器Cache访问均衡性研究被引量：3