RKDG有限元GPU 算法及其重排加速技术

A RKDG GPU parallel algorithm and its acceleration with reordering

作　　者：高缓钦陈红全[1,2] 张加乐[1,2] 贾雪松 GAO Huanqin;CHEN Hongquan;ZHANG Jiale;JIA Xuesong(College of Aerospace Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China;Key Laboratory of Unsteady Aerodynamics and Flow Control(Nanjing University of Aeronautics and Astronautics),Ministry of Industry and Information Technology,Nanjing 210016,China)

机构地区：[1]南京航空航天大学航空学院,南京210016 [2]非定常空气动力学与流动控制工信部重点实验室(南京航空航天大学),南京210016

出　　处：《哈尔滨工业大学学报》2023年第8期32-42,共11页Journal of Harbin Institute of Technology

基　　金：国家自然科学基金(11972189,12102188)。

摘　　要：为提升并行化求解Navier Stokes方程的效率,构建了高阶有限元单元及单元边界映射线程结构和对应的各类GPU核函数,成功地把RKDG方法移植到GPU架构,发展出RKDG有限元GPU并行算法。算法数据访存能兼容GPU快慢不一的存储器,尤其在结构网格上,算法涉及的数据依赖区结构有序,能较好满足GPU对齐合并访问的要求。但在非结构网格上,非结构化的数据依赖区,影响到访存效率。基于此提出一种适合高阶有限元算法框架的单元分层重排加速技术,致力于网格的层化结构,提升GPU访存效率。具体基于初始网格拓扑,创建单元或单元边界对应的分层结构,逐层重排,汇总形成适合GPU对齐合并访问的数据存储结构。文中结合排序实例,给出了这一重排加速技术的具体实施过程。算例表明,发展的算法逼近的阶数符合预期,计算结果能与现有文献或实验结果接近,且最大GPU加速比可达67.47。此外,非结构网格算例证实,算法可处理较为复杂的几何边界,且所提重排技术可进一步赢得重排加速。To enhance the parallel efficiency of solving Navier Stokes equations,a graphic processing unit(GPU)parallel algorithm,ported from Runge-Kutta discontinuous Galerkin(RKDG)method,is presented through constructing element-based or edge-based thread hierarchy and corresponding GPU kernels.The data storage and access of the algorithm are designed to be compatible for the various types of memories with different latencies.In comparison with the structured mesh counterpart,in which the structured domain of data dependence is already quite good for the requirement of coalesced memory access,the irregularity of unstructured mesh shows a negative effect on the performance of memory access.To remedy the negative effect,a multi-layered element reordering approach suitable for high-order finite element method is proposed to achieve further acceleration.Starting with the initial mesh,layer structures of elements or edges are constructed with reordering in a layer-by-layer manner to form the data structures suitable for coalesced memory access.An example of mesh reordering is provided with the implementation process detailed.Numerical results of typical flow simulations reveal that the expected order of accuracy of the proposed algorithm is realized,and the calculated results agree well with experiment data or other computed resules in the existing literature,with the maximum GPU speedups achieved up to 67.47.Moreover,the algorithm exhibits the potential to cope with more complex geometries,and the proposed technique can further achieve reordering acceleration.

关键词：RKDG方法 GPU 分层排序非结构网格 Navier Stokes方程

分类号：V211.3[航空宇航科学与技术—航空宇航推进理论与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

RKDG有限元GPU 算法及其重排加速技术

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

RKDG有限元GPU 算法及其重排加速技术

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索