GRAPES动力框架中大规模稀疏线性系统并行求解及优化  被引量:2

Parallel Solution and Optimization of Large-Scale Sparse Linear System in GRAPES Dynamic Framework

在线阅读下载全文

作  者:张琨 贾金芳 严文昕 黄建强[1,2] 王晓英 ZHANG Kun;JIA Jinfang;YAN Wenxin;HUANG Jianqiang;WANG Xiaoying(Department of Computer Technology and Applications,Qinghai University,Xining 810016,China;Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)

机构地区:[1]青海大学计算机技术与应用系,西宁810016 [2]清华大学计算机科学与技术系,北京100084

出  处:《计算机工程》2022年第1期149-154,162,共7页Computer Engineering

基  金:国家自然科学基金(61762074,62062059);青海省科技计划(2019-ZJ-7034);教育部“春晖计划”科研基金(QDCH2018001)。

摘  要:赫姆霍兹方程求解是GRAPES数值天气预报系统动力框架中的核心部分,可转换为大规模稀疏线性系统的求解问题,但受限于硬件资源和数据规模,其求解效率成为限制系统计算性能提升的瓶颈。分别通过MPI、MPI+OpenMP、CUDA三种并行方式实现求解大规模稀疏线性方程组的广义共轭余差法,并利用不完全分解LU预处理子(ILU)优化系数矩阵的条件数,加快迭代法收敛。在CPU并行方案中,MPI负责进程间粗粒度并行和通信,OpenMP结合共享内存实现进程内部的细粒度并行,而在GPU并行方案中,CUDA模型采用数据传输、访存合并及共享存储器方面的优化措施。实验结果表明,通过预处理优化减少迭代次数对计算性能提升明显,MPI+OpenMP混合并行优化较MPI并行优化性能提高约35%,CUDA并行优化较MPI+OpenMP混合并行优化性能提高约50%,优化性能最佳。The Helmholtz equation is the core of dynamic framework of Global and Regional Assimilation Prediction System(GRAPES)for numerical weather forecast.This equation can essentially be transformed into the solution of a large-scale sparse linear system,but the solution efficiency is limited by hardware resources and scaling data size,and becomes a bottleneck of the system computing performance.This paper explores three parallel methods(MPI,MPI+OpenMP and CUDA)of implementing the Generalized Conjugate Residual(GCR)method for solving large-scale sparse linear equations.At the same time,the ILU preconditioner is used to optimize the number of conditions of the coefficient matrix,which speeds up the convergence of the iterative method.In the CPU parallel scheme,MPI is responsible for coarse-grained parallelism and communication between processes,and OpenMP introduces shared memory to achieve fine-grained parallelism within the process.In the GPU parallel scheme,the CUDA model uses the optimization approaches of data transmission,coalesced access and shared memory.Experimental results show that the performance of MPI+OpenMP hybrid parallel optimization is about 35%higher than that of MPI parallel optimization,and the performance of CUDA parallel optimization is about 50%higher than that of MPI+OpenMP hybrid parallel optimization,which gets the best performance.

关 键 词:稀疏线性系统 广义共轭余差法 信息传递接口 OpenMP编程 统一计算架构 

分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象