面向异构体系结构的GA模型拓展  被引量:1

Extending Global Arrays on Heterogeneous System

在线阅读下载全文

作  者:程鹏[1,2] 卢宇彤[1,2] 高涛[1,2] 王晨旭[1,2] Cheng Peng;Lu Yu tong;Gao Tao;Wang Chenxu(State Key Laboratory of High Performance Computing (National University of Defense Technology) , Changsha 410073;College of Computer, National University of Defense Technology , Changsha 410073)

机构地区:[1]高性能计算国家重点实验室(国防科学技术大学),长沙410073 [2]国防科学技术大学计算机学院,长沙410073

出  处:《计算机研究与发展》2017年第4期804-812,共9页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61120106005)~~

摘  要:科学与工程应用对计算性能要求的不断增加使得异构计算得到了迅速发展,然而CPU与加速单元之间没有共享内存的特点增加了异构编程难度,编程人员必须显式地指定数据在不同设备之间的传递情况.全局数组(global arrays,GA)模型基于聚合远程内存拷贝接口(ARMCI)为分布式存储系统提供异步单边通信、共享内存的编程环境,但ARMCI接口拓展的复杂性使得GA不能根据特定计算平台的特点迅速在该平台上实现.CoGA模型是对GA模型的异构拓展,旨在为CPU+英特尔至强融核(MIC)的异构系统提供全局数组结构,隐藏数据传输细节从而简化异构编程难度.CoGA基于MIC上的对称传输接口(SCIF)实现对CPU和MIC的内存管理,并结合SCIF远程内存访问特点优化CPU与MIC间的数据传输性能.最后,通过数据传输带宽、通信延迟和稀疏矩阵乘问题的测试,证明了CoGA简化编程并优化数据传输性能的有效性和实用性.The increasing requirement for computational performance has led to the rapid development of heterogeneous computing.However,heterogeneous programming is more complicated since there is no shared memory between CPU and accelerators.Besides,programmers must distinguish the local or remote access of data and transmit the data between computing devices explicitly.Global arrays(GA)can provide an asynchronous one-sided,shared memory programming environment for distributed memory systems,but creating an efficient and scalable implementation of GA for a new system is a challenge because of the sophistication of communication library inside GA.In this paper,we present CoGA,the extension of GA on heterogeneous systems consist of CPU and Intel many integrated core(MIC).CoGA,which is built on the top of symmetric communication interface(SCIF),can provide a shared memory abstraction between CPU and MIC,and simplify the programming by allowing programmers to access the shared data regardless where the referenced data is located.Furthermore,CoGA takes advantage of SCIF remote memory access and optimizes the data transmission performance between CPU and MIC.The evaluation on data transmission bandwidth,communication latency and sparse-matrix vector multiplication problem proves that CoGA is practical and effective.

关 键 词:至强融核 全局数组 对称传输接口 异构计算 编程模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象