检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高缓钦 陈红全[1,2] 张加乐[1,2] 贾雪松 GAO Huanqin;CHEN Hongquan;ZHANG Jiale;JIA Xuesong(College of Aerospace Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China;Key Laboratory of Unsteady Aerodynamics and Flow Control(Nanjing University of Aeronautics and Astronautics),Ministry of Industry and Information Technology,Nanjing 210016,China)
机构地区:[1]南京航空航天大学航空学院,南京210016 [2]非定常空气动力学与流动控制工信部重点实验室(南京航空航天大学),南京210016
出 处:《哈尔滨工业大学学报》2023年第8期32-42,共11页Journal of Harbin Institute of Technology
基 金:国家自然科学基金(11972189,12102188)。
摘 要:为提升并行化求解Navier Stokes方程的效率,构建了高阶有限元单元及单元边界映射线程结构和对应的各类GPU核函数,成功地把RKDG方法移植到GPU架构,发展出RKDG有限元GPU并行算法。算法数据访存能兼容GPU快慢不一的存储器,尤其在结构网格上,算法涉及的数据依赖区结构有序,能较好满足GPU对齐合并访问的要求。但在非结构网格上,非结构化的数据依赖区,影响到访存效率。基于此提出一种适合高阶有限元算法框架的单元分层重排加速技术,致力于网格的层化结构,提升GPU访存效率。具体基于初始网格拓扑,创建单元或单元边界对应的分层结构,逐层重排,汇总形成适合GPU对齐合并访问的数据存储结构。文中结合排序实例,给出了这一重排加速技术的具体实施过程。算例表明,发展的算法逼近的阶数符合预期,计算结果能与现有文献或实验结果接近,且最大GPU加速比可达67.47。此外,非结构网格算例证实,算法可处理较为复杂的几何边界,且所提重排技术可进一步赢得重排加速。To enhance the parallel efficiency of solving Navier Stokes equations,a graphic processing unit(GPU)parallel algorithm,ported from Runge-Kutta discontinuous Galerkin(RKDG)method,is presented through constructing element-based or edge-based thread hierarchy and corresponding GPU kernels.The data storage and access of the algorithm are designed to be compatible for the various types of memories with different latencies.In comparison with the structured mesh counterpart,in which the structured domain of data dependence is already quite good for the requirement of coalesced memory access,the irregularity of unstructured mesh shows a negative effect on the performance of memory access.To remedy the negative effect,a multi-layered element reordering approach suitable for high-order finite element method is proposed to achieve further acceleration.Starting with the initial mesh,layer structures of elements or edges are constructed with reordering in a layer-by-layer manner to form the data structures suitable for coalesced memory access.An example of mesh reordering is provided with the implementation process detailed.Numerical results of typical flow simulations reveal that the expected order of accuracy of the proposed algorithm is realized,and the calculated results agree well with experiment data or other computed resules in the existing literature,with the maximum GPU speedups achieved up to 67.47.Moreover,the algorithm exhibits the potential to cope with more complex geometries,and the proposed technique can further achieve reordering acceleration.
关 键 词:RKDG方法 GPU 分层排序 非结构网格 Navier Stokes方程
分 类 号:V211.3[航空宇航科学与技术—航空宇航推进理论与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38