检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蔡雨 孙成国 杜朝晖 刘子行 康梦博 李双双 CAI Yu;SUN Cheng-Guo;DU Zhao-Hui;LIU Zi-Xing;KANG Meng-Bo;LI Shuang-Shuang(Information Technology Co.,Ltd.,Suzhou 215000,China)
出 处:《软件学报》2021年第8期2289-2306,共18页Journal of Software
摘 要:异构HPL(high-performance Linpack)效率的提高需要充分发挥加速部件和通用CPU计算能力,加速部件集成了更多的计算核心,负责主要的计算,通用CPU负责任务调度的同时也参与计算.在合理划分任务、平衡负载的前提下,优化CPU端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对BLAS(basic linear algebra subprograms)函数进行优化往往可以更加充分地利用通用CPU计算能力,提高系统整体效率.BLIS(BLAS-like library instantiation software)算法库是开源的BLAS函数框架,具有易开发、易移植和模块化等优点.基于异构系统平台体系结构以及HPL算法特点,充分利用三级缓存、向量化指令和多线程并行等技术手段优化CPU端调用的各级BLAS函数,应用auto-tuning技术优化矩阵分块参数,从而形成了异构环境下优化的BLIS算法库HBLIS.与MKL相比,HPL整体性能提高了11.8%.Improving the efficiency of heterogeneous HPL needs to fully utilize the computing power of acceleration components and CPU,the acceleration components integrate more computing cores and are responsible for the main calculation.The general CPU is responsible for task scheduling and also participates in calculation.Under the premise of reasonable division of tasks and load balancing,optimizing CPU-side computing performance is particularly important to improve overall efficiency.Optimizing the basic linear algebra subprogram(BLAS)functions for specific platform architecture characteristics can often make full use of general-purpose CPU computing capabilities to improve the overall system efficiency.The BLIS(BLAS-like library instantiation software)algorithm library is an open source BLAS function framework,which has the advantages of easy development,portability,and modularity.Based on the heterogeneous system platform architecture and HPL algorithm characteristics,this study uses three-level cache,vectorized instructions,and multi-threaded parallel technology to optimize the BLAS functions called by the CPU,applies auto-tuning technology to optimize the matrix block parameters,and eventually forms the optimized BLIS algorithm library in heterogeneous environment.Compared with MKL,the overall performance of the HPL using the optimized HBLIS has been improved by 11.8%.
关 键 词:BLAS 遗传算法auto-tuning 向量化指令 数据预取 多线程并行
分 类 号:TP303[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.128.173.223