面向图计算应用的处理器访存通路优化设计与实现  

Design and implementation of a novel off-chip memory access path for graph computing

在线阅读下载全文

作  者:张旭[1,2] 常轶松 张科[1,2,3] 陈明宇 ZHANG Xu;CHANG Yisong;ZHANG Ke;CHEN Mingyu(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;Peng Cheng Laboratory,Shenzhen 518000,China)

机构地区:[1]中国科学院计算技术研究所,北京100190 [2]中国科学院大学,北京100049 [3]鹏城实验室,广东深圳518000

出  处:《国防科技大学学报》2020年第2期13-22,共10页Journal of National University of Defense Technology

基  金:国家重点研发计划资助项目(2017YFB1001602);国家自然科学基金资助项目(61702485);中国科学院青年创新促进会资助项目(2017143)。

摘  要:针对图计算应用的访存特点,提出并实现一种支持高并发、乱序和异步访存的高并发访存模块(High Concurrency and high Performance Fetcher,HCPF)。通过软-硬件协同的设计方法,HCPF可同时处理192条共8种类型的内存访问请求,且访存粒度可由用户定义,满足图计算应用对海量低延迟细粒度数据访问的需求。同时,HCPF扩展了基于内存语义的跨计算节点定制互连技术,支持远程内存的细粒度直接访问,为后续实现分布式图计算框架提供技术基础。结合上述两个核心研究内容,基于流水线RISC-V处理器核,设计并实现了可支持HCPF的RISC-V片上系统(System-on-Chip,SoC)架构,搭建基于FPGA的原型验证平台,并使用自研测试程序对HCPF进行初步性能评测。实验结果表明,HCPF相比原有访存通路,最高可将基于数组和随机地址的两种随机内存访问性能分别提升至3.5倍和2.7倍。远程内存直接访问4 Byte数据的延时仅为1.63μs。A novel asynchronous memory access path,which supports highly concurrent and out-of-order off-chip memory requests was proposed.In order to satisfy the requirements of graph applications,a software-defined interface in our proposed memory access path to handle hundreds of kinds of off-chip memory requests with arbitrary granularity via hardware-software co-design methodology was implemented.A custom memory semantic interconnect was designed for fine-grained remote memory access among various computing nodes leveraged in future distributed graph processing scenarios.Last but not least,we integrate our proposed novel memory access path into a RISC-V instruction set architecture-based SoC(system-on-chip)architecture and implement an FPGA prototype.Based on our custom random access microbenchmarks,preliminary evaluation results show that performance of array-based and random address-based off-chip memory access is improved by 3.5x and 2.7x respectively using our proposed asynchronous memory access path,and accessing 4 bytes data from remote memory only takes 1.63μs.

关 键 词:内存级并行 访存通路 图计算应用 

分 类 号:TN95[电子电信—信号与信息处理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象