检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张琨 贾金芳 黄建强[1,2] 王晓英 严文昕[1] ZHANG Kun;JIA Jin-fang;HUANG Jian-qiang;WANG Xiao-ying;YAN Wen-xin(Department of Computer Technology and Applications,Qinghai University,Xining 810016,China;Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)
机构地区:[1]青海大学计算机技术与应用系,西宁810016 [2]清华大学计算机科学与技术系,北京100084
出 处:《小型微型计算机系统》2022年第10期2040-2045,共6页Journal of Chinese Computer Systems
基 金:青海省科技计划项目-应用基础研究计划项目(2019-ZJ-7034)资助;国家自然科学基金项目(61762074,62062059)资助;教育部“春晖计划”科研基金项目(QDCH2018001)资助.
摘 要:共轭梯度算法是求解对称正定线性系统的重要方法之一,该算法求解问题通常具有稀疏性.随着问题规模的不断增大,单CPU因其存储及计算能力限制已经不能满足大规模稀疏线性方程组求解的实时需求.基于此,本文提出一种基于CPU+GPU异构平台的MPI+CUDA异构并行求解算法.首先,对共轭梯度算法进行了热点性能分析,说明该算法求解时存在的计算困难及挑战;然后,根据共轭梯度算法特性进行了任务划分,实现异构并行算法设计;最后,针对异构并行算法中存在的通信开销、数据传输开销和存储器访问开销等问题,对异构并行算法进行优化以进一步提升求解效率及性能.实验结果表明,与MPI并行和CUDALib并行相比,MPI+CUDA异构混合并行在串行计算部分较少的Jacobi预处理共轭梯度算法上分别获得336%和33%的性能提升,在串行计算部分较多的ILU预处理共轭梯度算法上也能分别获得25%和7%的性能提升,同时结果还显示MPI+CUDA混合并行随着节点数目的增加具有一定可扩展性.Conjugate gradient method is one of the important methods for solving symmetric positive definite linear systems.The problem solved by this algorithm is usually sparse.As the scale of the problem continues to increase,a single CPU can no longer meet the real-time requirements for solving large-scale sparse linear equations due to its storage and computing capacity limitations.Based on this,this paper proposes an MPI+CUDA heterogeneous parallel solving algorithm based on CPU+GPU heterogeneous platform.First,the hotspot performance analysis of the conjugate gradient algorithm is carried out to illustrate the computational difficulties and challenges in the solution of the algorithm;Then,the task is divided according to the characteristics of the conjugate gradient algorithm,and the design of heterogeneous parallel algorithms is realized;To solve the problems of communication overhead,data transmission overhead and memory access overhead in parallel algorithms,the heterogeneous parallel algorithms are optimized to further improve the efficiency and performance of the solution.The experimental results show that compared with MPI parallel and CUDALib parallel,MPI+CUDA heterogeneous hybrid parallel achieves 336%and 33%performance improvement on the Jacobi preconditioned conjugate gradient algorithm with less serial calculation part,25%and 7%performance improvements respectively on ILU preconditioned conjugate gradient algorithm with more serial calculation part.At the same time,the results also show that the MPI+CUDA hybrid parallel has a certain degree of scalability as the number of nodes increases.
关 键 词:对称正定线性系统 共轭梯度算法 预处理技术 异构并行
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.232