检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张琨 贾金芳 严文昕 黄建强[1,2] 王晓英 ZHANG Kun;JIA Jinfang;YAN Wenxin;HUANG Jianqiang;WANG Xiaoying(Department of Computer Technology and Applications,Qinghai University,Xining 810016,China;Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)
机构地区:[1]青海大学计算机技术与应用系,西宁810016 [2]清华大学计算机科学与技术系,北京100084
出 处:《计算机工程》2022年第1期149-154,162,共7页Computer Engineering
基 金:国家自然科学基金(61762074,62062059);青海省科技计划(2019-ZJ-7034);教育部“春晖计划”科研基金(QDCH2018001)。
摘 要:赫姆霍兹方程求解是GRAPES数值天气预报系统动力框架中的核心部分,可转换为大规模稀疏线性系统的求解问题,但受限于硬件资源和数据规模,其求解效率成为限制系统计算性能提升的瓶颈。分别通过MPI、MPI+OpenMP、CUDA三种并行方式实现求解大规模稀疏线性方程组的广义共轭余差法,并利用不完全分解LU预处理子(ILU)优化系数矩阵的条件数,加快迭代法收敛。在CPU并行方案中,MPI负责进程间粗粒度并行和通信,OpenMP结合共享内存实现进程内部的细粒度并行,而在GPU并行方案中,CUDA模型采用数据传输、访存合并及共享存储器方面的优化措施。实验结果表明,通过预处理优化减少迭代次数对计算性能提升明显,MPI+OpenMP混合并行优化较MPI并行优化性能提高约35%,CUDA并行优化较MPI+OpenMP混合并行优化性能提高约50%,优化性能最佳。The Helmholtz equation is the core of dynamic framework of Global and Regional Assimilation Prediction System(GRAPES)for numerical weather forecast.This equation can essentially be transformed into the solution of a large-scale sparse linear system,but the solution efficiency is limited by hardware resources and scaling data size,and becomes a bottleneck of the system computing performance.This paper explores three parallel methods(MPI,MPI+OpenMP and CUDA)of implementing the Generalized Conjugate Residual(GCR)method for solving large-scale sparse linear equations.At the same time,the ILU preconditioner is used to optimize the number of conditions of the coefficient matrix,which speeds up the convergence of the iterative method.In the CPU parallel scheme,MPI is responsible for coarse-grained parallelism and communication between processes,and OpenMP introduces shared memory to achieve fine-grained parallelism within the process.In the GPU parallel scheme,the CUDA model uses the optimization approaches of data transmission,coalesced access and shared memory.Experimental results show that the performance of MPI+OpenMP hybrid parallel optimization is about 35%higher than that of MPI parallel optimization,and the performance of CUDA parallel optimization is about 50%higher than that of MPI+OpenMP hybrid parallel optimization,which gets the best performance.
关 键 词:稀疏线性系统 广义共轭余差法 信息传递接口 OpenMP编程 统一计算架构
分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.79.94