共享指令缓存XOR散列索引的研究与设计  被引量:2

Research and Design of XOR-Hash Indexing for Shared Instruction Cache

在线阅读下载全文

作  者:刘骁[1] 唐勇[1] 郑方[1] 丁亚军[1] LIU Xiao;TANG Yong;ZHENG Fang;DING Ya-Jun(Jiangnan Institute of Computing Technology,Wuxi,Jiangsu 214083)

机构地区:[1]江南计算技术研究所

出  处:《计算机学报》2019年第11期2499-2511,共13页Chinese Journal of Computers

基  金:国家重点研发计划(2016YFB0200500)资助~~

摘  要:SPMD(Single Program Multiple Data)是高性能领域的主要工作模式之一,该模式下邻近核心执行相同的程序块,但根据处理数据或控制流的差异,临近核心的指令流并不完全相同.L1 ICache(Instruction Cache)共享技术通过将邻近核心的L1 ICache共享,能有效利用众核处理器SPMD工作模式的特点,同时能缓解片上资源紧张的问题.但共享结构会带来访问冲突,对性能有不利影响.本文基于排队网络对共享ICache的访问冲突进行了理论分析,该理论分析依据核心对共享ICache体的访问特性进行建模,避免了直接抽象物理节点导致的模型访存特性模糊问题.根据理论推导的指令缓存性能损失原因,本文设计了面向共享L1 ICache的低访问冲突XOR散列函数.函数的设计综合考虑搜索了代价和工程实现复杂性,在保证散列线性空间随机散列能力的前提下,对附加延迟、功耗开销进行控制.该散列函数基于异或操作,通过调整ICache排队网络模型的节点转换概率,降低了共享L1 ICache的访问冲突.实验结果表明,在指令缓存总容量为32 KB的四核心簇上,使用XOR散列的共享L1 ICache结构较私有L1 ICache结构性能平均优化11%,较使用低位交错策略的共享L1 ICache结构性能平均优化8%,较使用面向跨步访存散列策略的共享L1 ICache结构性能平均优化3.2%.Single program multiple data(SPMD)is a main execution mode in the high-performance computing domain.While processing the same program segment,each adjacent core’s execution varies depends on the data it processes and its own control flow.Many-core processor has been widely used in high performance computing domain for its advantages in high peak performance,high calculated density and high energy efficiency.While ensuring the performance,many-core processor has put forward higher requirements on power and area cost for it incorporates more cores and larger scale logic into a single chip.The SPMD execution mode can be effectively utilized by sharing the L1 instruction cache across adjacent cores.The strain on the on-chip resources is also alleviated by using the shared instruction cache.However,the sharing structure has the negative impact on performance,which is caused by access conflicts in the shared instruction cache.In this paper,we first give a theoretical analysis on access conflicts of the shared instruction cache based on the queuing network.Rather than physically corresponding banks of the instruction cache to queuing nodes,we model the shared instruction cache according to the cores’instruction fetch pattern.Queueing network reflects the steady-state performance of system when time tends to infinity.However,the access frequencies on each instruction cache bank tend to be the same in such a long period of time.In other words,queueing network on physical cache banks may not precisely reflect the intensive conflicts on each bank.The theoretical analysis can achieve more accurate characteristics of access in shared instruction cache by utilizing the model on cores’instruction fetch pattern.The model of shared instruction cache given in this paper is later verified by simulation results.Based on the causes of performance loss in the theoretical analysis,we then design an XOR-hash function to minimize access conflicts in the shared L1 instruction cache.In the design of XOR-hash function,we accelerate th

关 键 词:单程序多数据流模型 指令缓存 众核处理器 排队网络模型 XOR散列函数 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象