复杂异构计算系统HPL的优化被引量：2

Optimization of HPL on Complex Heterogeneous Computing System

作　　者：黎雷生[1,2] 杨文浩马文静[1,2] 张娅赵慧赵海涛[1,2] 李会元[1,2] 孙家昶[1,2] LI Lei-Sheng;YANG Wen-Hao;MA Wen-Jing;ZHANG Ya;ZHAO Hui;ZHAO Hai-Tao;LI Hui-Yuan;SUN Jia-Chang(Laboratory of Parallel Software and Computational Science,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;State Key Laboratory of Computer Science(Institute of Software,Chinese Academy of Sciences),Beijing 100190,China)

机构地区：[1]中国科学院软件研究所并行软件与计算科学实验室,北京100190 [2]计算机科学国家重点实验室(中国科学院软件研究所),北京100190

出　　处：《软件学报》2021年第8期2307-2318,共12页Journal of Software

基　　金：中国科学院战略性先导科技专项(C类)(XDC01030200);国家重点研发计划(2018YFB0204404,2016YFB0200601);国家自然科学基金(11871455,11971016)。

摘　　要：当今世界的主流超级计算机越来越多地使用带有加速器的异构系统.随着加速器的浮点性能不断提高,超级计算机内计算节点的CPU、内存、总线、网络以及系统架构都要与之相适应.HPL(high performance Linpack)是高性能计算机评测的传统基准测试程序,复杂异构系统给HPL评测带来很多机遇与挑战.针对带有GPU的异构超级计算机系统,提出一套新的CPU与加速器计算任务分配方式,提出平衡点理论指导HPL性能优化.为了优化HPL程序,提出了使用CPU与加速器协同工作的look-ahead算法和行交换连续流水算法,实现了加速器、CPU、网络等部件的高度并行.此外,为带有加速器的系统设计了新的panel分解和行交换的实现方法,提高了加速器的利用率.在每个节点带有4个GPU的系统上,单节点HPL效率达到了79.51%.Nowadays,the mainstream supercomputers in the world adopt heterogeneous systems with accelerators more and more.The increase of float point computation performance of the accelerators requires other components to match its speed,including CPU,memory,bus,and network.High performance Linpack(HPL)is the traditional benchmark for high performance computers.Complex heterogeneous systems have brought both opportunities and challenges to the benchmarking with HPL.Therefore,for heterogeneous supercomputers,a new task partitioning scheme between the CPU and the accelerators is proposed,using the balance point theory to guide the optimization of HPL.For optimizing HPL,a look-ahead algorithm is proposed to coordinate the collaboration of CPU and the accelerators,as well as a contiguous row-swap algorithm,enabling the parallelism among CPU,accelerators,and network.Besides,new panel factorization and row-swap implementations have been designed for the system with accelerators,improving the effectiveness and efficiency of the usage of accelerators.With the configuration of 4 GPUs on each computing node,HPL efficiency of 79.51% on a single node.

关键词：复杂异构系统平衡点理论 panel分解加速连续流水线算法

分类号：TP303[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

复杂异构计算系统HPL的优化被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

复杂异构计算系统HPL的优化 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

复杂异构计算系统HPL的优化被引量：2