面向异构计算机平台的HPL方案  

HPL Approach for Heterogeneous Computer Platforms

在线阅读下载全文

作  者:孙乔[1] 孙家昶[1] 马文静[1,2] 赵玉文[1,3] SUN Qiao;SUN Jia-Chang;MA Wen-Jing;ZHAO Yu-Wen(Laboratory of Parallel Software and Computational Science,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;State Key Laboratory of Computer Science(Institute of Software,Chinese Academy of Sciences),Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院软件研究所并行软件与计算科学实验室,北京100190 [2]计算机科学国家重点实验室(中国科学院软件研究所),北京100190 [3]中国科学院大学,北京100049

出  处:《软件学报》2021年第8期2329-2340,共12页Journal of Software

基  金:国家重点研发计划(2018YFB0204404);中国科学院战略性先导科技专项(C类)(XDC01030200)。

摘  要:HPL(high performance Linpack)是一套被广泛用于测评计算机性能的测试程序,几十年来学术界及产业界十分关注对HPL测试程序的定制化优化工作,以充分反应同时代新兴计算机平台的性能.面向当今主流多设备异构计算平台,尝试为HPL的优化工作提供一种解决方案:Hetero-HPL.在Hetero-HPL中,进程与协处理器的对应关系可被改变,因此HPL算法在单节点独立运行情况下可以完全避免进程间数据传输开销.算法各个重要步骤有能力完全利用物理节点的所有资源,如内存容量、CPU核心、协处理器、PCI-e总线等.Hetero-HPL并不引入冗余计算量及通信量,并在任意设备数量下妥善应对锁页内存分配限制,确保多设备负载均衡和设备内高效的大规模同质运算.在实验平台上,Hetero-HPL效率可以达到平台峰值性能的76.5%(其中,dgemm函数效率为84%).进一步的实验结果表明,Hetero-HPL在多节点联机运行情况下也是一种可行的方案.HPL(high performance Linpack)is a widely used benchmark for measuring computer performance.Over the decades,the practice of optimizing and tuning of HPL has constantly drawn great attention in both industrial and academic circle,to evaluate the performance of contemporary cutting-edge computer platforms.For current heterogeneous HPC platforms with multiple accelerating co-processors,an approach of high-performance HPL benchmark,Hetero-HPL,is proposed in this paper.In Hetero-HPL,the mapping between process set and(co-)processor set becomes adjustable,so that the computation within each computing node may avoid inter-process message exchange,and each important procedure of the HPL algorithm may make full use of the hardware resources of the computing node,such as memory,CPU cores,co-processors,and PCI-e bus etc.Without redundant computation and communication,the working set of Hetero-HPL is not restricted by the limit of pinned memory size in a single allocation,and is distributed in a way that the workload is balanced among all the co-processors and massive fine-grained parallelism can be exploited.On one experimental platform with four co-processors,Heter-HPL can reach an efficiency of 76.5%(the efficiency of function dgemm is 84%)in one computing node,and further experiment suggests that Hetero-HPL is also a feasible approach in distributed environment.

关 键 词:HPL(high performance Linpack) 多设备异构平台 并行计算 

分 类 号:TP303[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象