检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:伍明川 刘颖[1] 李立民 冯晓兵[1,2] WU Mingchuan;LIU Ying;LI Limin;FENG Xiaobing(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
机构地区:[1]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190 [2]中国科学院大学,北京100049
出 处:《高技术通讯》2022年第9期927-936,共10页Chinese High Technology Letters
基 金:国家重点研发计划(2016YFB0200803);国家自然科学基金面上项目(61872043);国家自然科学基金青年科学基金(61802368)资助项目。
摘 要:近年来,科学领域对高性能计算的需求与日俱增,如何有效利用新型超算架构的计算能力成为研究重点。我国自主研制的神威·太湖之光超算平台,采用了国产异构众核处理器SW26010,其包含4个核组,但未提供核组间的同步机制。为了增加其易编程性,本文提出了面向神威·太湖之光的核组间同步方法,并在SWCL OpenCL编译器中实现了该核组间同步方法。该方法利用跨OpenCL主机内核的数据依赖分析来标识必要的同步操作位置,并通过SW26010的交叉段进行低开销的核组间通信,程序员在不使用消息传递接口(MPI)进行显式控制同步的情况下,可以自动地将一个OpenCL Kernel程序部署到多个核组上。使用SPEC ACCEL 1.2中的OpenCL测试用例在神威太湖之光平台的实验表明,本方法的加速效果明显优于传统的MPI实现版本。In recent years,demands for high performance computing has been increased significantly in various scientific domains.How to effectively utilize the computing power of the new supercomputing architecture has become a research focus.The homegrown Sunway TaihuLight supercomputer adopts the homegrown heterogeneous many-core processor SW26010.In order to efficiently use the computing power of the four core groups on the SW26010 and reduce the difficulty of programming,an inter-CG(core group)synchronization generation method on the Sunway TaihuLight is proposed,and the inter-core synchronization generator based on SWCL OpenCL is designed and implemented.This method proposes data dependency analysis across OpenCL host and kernel to identify the necessary synchronization operation,and uses memory intersection of SW26010 to communicate between core groups,which reduces communication overhead and ensures that programmers do not need to use the message passing interface(MPI)for explicit control synchronization.In this case,one OpenCL Kernel program is automatically deployed to multiple core groups.Experiments are carried out using the OpenCL test cases in SPEC ACCEL 1.2,and the results show that the acceleration effect of this method is significantly better than the traditional MPI implementation version.
关 键 词:OPENCL 国产众核处理器 异构 同步 数据依赖分析
分 类 号:TP332[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117