检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贾瑞鹏 林中朝[1] 左胜 张玉[1] 杨美红 JIA Ruipeng;LIN Zhongchao;ZUO Sheng;ZHANG Yu;YANG Meihong(School of Electronic Engineering,Xidian University,Xi’an 710071,China;School of Computer Science and Technology,Qilu University of Technology,Ji’nan 250000,China)
机构地区:[1]西安电子科技大学电子工程学院,陕西西安710071 [2]齐鲁工业大学计算机科学与技术学院,山东济南250000
出 处:《西安电子科技大学学报》2024年第2期76-83,共8页Journal of Xidian University
基 金:陕西省重点研发计划(2023-ZDLGY-09,2022ZDLGY02-01,2021GXLH-02);中央高校基本科研业务费专项资金(QTZX23018)。
摘 要:面向国产异构众核处理器超级计算机发展趋势,实现了基于CPU+DCU国产异构并行系统的大规模并行高阶矩量法。在同构并行矩量法负载均衡策略的基础上,提出了一种“MPI+openMP+DCU”的高效异构并行编程框架,解决了计算任务与计算能力不匹配的问题,实现了矩量法异构并行计算过程的负载均衡。采用细粒度任务划分策略与异步通信技术,对深度计算处理器计算过程进行了流水线优化设计,实现了计算与通信重叠,提升了矩量法异构协同计算的效率。通过与有限元法的仿真结果对比,验证了CPU+DCU异构并行矩量法的准确性。基于国产深度计算处理器异构平台的可扩展性分析结果表明,与单纯CPU计算相比,所实现的CPU+DCU异构协同计算方法能够获得5.5~7.0倍的加速效果,且在国家超级计算西安中心能够实现全系统运行,并行规模从360节点扩展到3 600节点(共1 036 800个处理器核心),并行效率可以达到约73.5%。In view of the current development trend of the domestic supercomputer CPU+DCU heterogeneous architecture,the research on the CPU+DCU massively heterogeneous parallel higher-order method of moments is carried out.First,the basic implementation strategy of DCU to accelerate the calculation of the method of moments is given.Based on the load balancing parallel strategy of the isomorphic parallel moment of methods,an efficient heterogeneous parallel programming framework of"MPI+openMP+DCU"is proposed to address the problem of mismatch between computing tasks and computing power.In addition,the fine-grained task division strategy and asynchronous communication technology are adopted to optimize the design of the pipeline for the DCU computation process,thus realizing the overlapping of computation and communication and improving the acceleration performance of the program.The accuracy of the CPU+DCU heterogeneous parallel moment of methods is verified by comparing the simulation results with those by the finite element method.The scalability analytical results based on the domestic DCU heterogeneous platform show that the implemented CPU+DCU heterogeneous co-computing program can obtain 5.5~7.0 times acceleration effect at different parallel scales,and that the parallel efficiency reaches 73.5%when scaled from 360 nodes to 3600 nodes(1,036,800 cores in total).
关 键 词:高阶矩量法 国产异构并行系统 深度计算处理器 异构协同并行计算
分 类 号:TN820[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7