异构HPL算法中CPU端高性能BLAS库优化  被引量:2

CPU-side High Performance BLAS Library Optimization in Heterogeneous HPL Algorithm

在线阅读下载全文

作  者:蔡雨 孙成国 杜朝晖 刘子行 康梦博 李双双 CAI Yu;SUN Cheng-Guo;DU Zhao-Hui;LIU Zi-Xing;KANG Meng-Bo;LI Shuang-Shuang(Information Technology Co.,Ltd.,Suzhou 215000,China)

机构地区:[1]信息技术有限公司,江苏苏州215000

出  处:《软件学报》2021年第8期2289-2306,共18页Journal of Software

摘  要:异构HPL(high-performance Linpack)效率的提高需要充分发挥加速部件和通用CPU计算能力,加速部件集成了更多的计算核心,负责主要的计算,通用CPU负责任务调度的同时也参与计算.在合理划分任务、平衡负载的前提下,优化CPU端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对BLAS(basic linear algebra subprograms)函数进行优化往往可以更加充分地利用通用CPU计算能力,提高系统整体效率.BLIS(BLAS-like library instantiation software)算法库是开源的BLAS函数框架,具有易开发、易移植和模块化等优点.基于异构系统平台体系结构以及HPL算法特点,充分利用三级缓存、向量化指令和多线程并行等技术手段优化CPU端调用的各级BLAS函数,应用auto-tuning技术优化矩阵分块参数,从而形成了异构环境下优化的BLIS算法库HBLIS.与MKL相比,HPL整体性能提高了11.8%.Improving the efficiency of heterogeneous HPL needs to fully utilize the computing power of acceleration components and CPU,the acceleration components integrate more computing cores and are responsible for the main calculation.The general CPU is responsible for task scheduling and also participates in calculation.Under the premise of reasonable division of tasks and load balancing,optimizing CPU-side computing performance is particularly important to improve overall efficiency.Optimizing the basic linear algebra subprogram(BLAS)functions for specific platform architecture characteristics can often make full use of general-purpose CPU computing capabilities to improve the overall system efficiency.The BLIS(BLAS-like library instantiation software)algorithm library is an open source BLAS function framework,which has the advantages of easy development,portability,and modularity.Based on the heterogeneous system platform architecture and HPL algorithm characteristics,this study uses three-level cache,vectorized instructions,and multi-threaded parallel technology to optimize the BLAS functions called by the CPU,applies auto-tuning technology to optimize the matrix block parameters,and eventually forms the optimized BLIS algorithm library in heterogeneous environment.Compared with MKL,the overall performance of the HPL using the optimized HBLIS has been improved by 11.8%.

关 键 词:BLAS 遗传算法auto-tuning 向量化指令 数据预取 多线程并行 

分 类 号:TP303[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象