检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:程鹏[1,2] 卢宇彤[1,2] 高涛[1,2] 王晨旭[1,2] Cheng Peng;Lu Yu tong;Gao Tao;Wang Chenxu(State Key Laboratory of High Performance Computing (National University of Defense Technology) , Changsha 410073;College of Computer, National University of Defense Technology , Changsha 410073)
机构地区:[1]高性能计算国家重点实验室(国防科学技术大学),长沙410073 [2]国防科学技术大学计算机学院,长沙410073
出 处:《计算机研究与发展》2017年第4期804-812,共9页Journal of Computer Research and Development
基 金:国家自然科学基金项目(61120106005)~~
摘 要:科学与工程应用对计算性能要求的不断增加使得异构计算得到了迅速发展,然而CPU与加速单元之间没有共享内存的特点增加了异构编程难度,编程人员必须显式地指定数据在不同设备之间的传递情况.全局数组(global arrays,GA)模型基于聚合远程内存拷贝接口(ARMCI)为分布式存储系统提供异步单边通信、共享内存的编程环境,但ARMCI接口拓展的复杂性使得GA不能根据特定计算平台的特点迅速在该平台上实现.CoGA模型是对GA模型的异构拓展,旨在为CPU+英特尔至强融核(MIC)的异构系统提供全局数组结构,隐藏数据传输细节从而简化异构编程难度.CoGA基于MIC上的对称传输接口(SCIF)实现对CPU和MIC的内存管理,并结合SCIF远程内存访问特点优化CPU与MIC间的数据传输性能.最后,通过数据传输带宽、通信延迟和稀疏矩阵乘问题的测试,证明了CoGA简化编程并优化数据传输性能的有效性和实用性.The increasing requirement for computational performance has led to the rapid development of heterogeneous computing.However,heterogeneous programming is more complicated since there is no shared memory between CPU and accelerators.Besides,programmers must distinguish the local or remote access of data and transmit the data between computing devices explicitly.Global arrays(GA)can provide an asynchronous one-sided,shared memory programming environment for distributed memory systems,but creating an efficient and scalable implementation of GA for a new system is a challenge because of the sophistication of communication library inside GA.In this paper,we present CoGA,the extension of GA on heterogeneous systems consist of CPU and Intel many integrated core(MIC).CoGA,which is built on the top of symmetric communication interface(SCIF),can provide a shared memory abstraction between CPU and MIC,and simplify the programming by allowing programmers to access the shared data regardless where the referenced data is located.Furthermore,CoGA takes advantage of SCIF remote memory access and optimizes the data transmission performance between CPU and MIC.The evaluation on data transmission bandwidth,communication latency and sparse-matrix vector multiplication problem proves that CoGA is practical and effective.
关 键 词:至强融核 全局数组 对称传输接口 异构计算 编程模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222