机构地区:[1]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190 [2]中国科学院大学计算机科学与技术学院,北京100049 [3]龙芯中科技术股份有限公司,北京100095
出 处:《计算机学报》2022年第10期2207-2220,共14页Chinese Journal of Computers
基 金:中国科学院战略性先导科技专项(C类)课题(XDC05020100)项目资助.
摘 要:随着高性能处理器集成度、面积以及工作频率的不断增加,时钟动态功耗呈指数级增加,时钟分布不均导致跨时钟域的同步开销显著增大,这些问题逐渐成为制约处理器能效提升的瓶颈.通常处理器核的功耗占多核处理器整体功耗超过70%,而时钟功耗是处理器核功耗的主要组成部分.数字方式的系统动态调频DFS(Dynamic Frequency Scaling)降频的方法需要触发时钟中断例外重新配置时钟生成模块锁相环的相关寄存器,由此带来系统超过毫秒级等待时间开销;而模拟方式连续自适应调节AFS(Adaptive Frequency Scaling)频率变化过程中存在频率过冲响应会增加物理时序设计压力.与此同时功耗的调节降低要以高性能为前提.片上时钟分布长延时随PVT(Process Voltage Temperature)变化产生的不确定时钟相位偏差,为此物理设计增加时序冗余补偿会直接影响到处理器性能.本文提出了新的基于解耦去偏斜锁相环De-skew PLL(De-skew Phase Locked Loop)的同步间歇时钟系统,采用12 nm CMOS工艺实现了去偏斜锁相环的设计,并对整个系统进行了时序性能和时钟功耗的评估.该系统一方面可以利用去偏斜锁相环的远端时钟反馈技术实现不同时钟域之间的实时相位对齐,同时也可以抵抗反馈环内时钟分布延时随PVT的变化;另一方面可以利用新增加的解耦模块,无频率过冲地响应处理器核内产生的时钟间歇控制(时钟脉冲间断性停拍)信号降频,从而实现亚纳秒级时钟动态功耗控制.以12 nm工艺同步级联结构为例,每层时钟分布校准后同步偏差小于10 ps.使用16核LS3C5000处理器RTL在仿真加速平台上运行SPEC CPU 2000测试集来评估本方案对处理器核时钟功耗的影响,并进一步通过PTPX后仿真验证,结果表明,定点及浮点程序平均功耗节约分别大于4.5%和20.3%.With the increasing of processor’s integration,area and working frequency,clock power consumption is increasing exponentially,and the cost of synchronization across different clock domains becomes serious due to distribution’s non-uniform.Both of these issues have already become the bottleneck that restricts the energy efficiency of the processor.Normally,the processor core’s power consumption accounted for more than 70%of the total power of the multi-core processor,and clock power is the main component of the process core’s power consumption.DFS(Dynamic Frequency Scaling)requires to trigger clock interrupt exceptions and reconfigure PLL(Phase Locked Loop)’s relevant registers,but state shift results in milliseconds level system waiting time.AFS(Adaptive Frequency Scaling)without system control continually adjusts operating frequency by tracking power supply level’s change.While frequency overshoot could not be avoided during the tuning process,which brings extra physical timing constraints.Clock system’s low power design could not be at the expense of processors’performance.Clock distribution’s delay deviates with PVT(Process Voltage Temperature)’s variation.Increasing timing margin to compensate for clock phase differences will directly affect the timing performance of critical paths.In this paper,a new synchronized intermittent clock system based on decoupled De-skew PLL is proposed firstly;subsequently a De-skew PLL which supporting stable phase error calibration is realized in 12 nm CMOS process;finally,the timing performance and clock power consumption are evaluated for the whole system.On the one hand,this new clock system structure not only can realize real-time phase alignment between different clock domains by De-skew PLL’s remote feedback,but also can immune clock tree delay’s PVT variation by real-time in loop tracking;on the other hand,De-skew PLL’s decoupling module can decouple the relationship between clock tree frequency and PLL’s loop configuration without loss lock,w
关 键 词:多核处理器 同步间歇时钟系统 解耦去偏斜锁相环 低功耗设计
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...