基于MPI+CUDA的DSMC/PIC耦合模拟异构并行及性能优化研究  

Heterogeneous Parallel Computing and Performance Optimization for DSMC/PIC Coupled Simulation Based on MPI+CUDA

在线阅读下载全文

作  者:林拥真 徐传福[1,2] 邱昊中 汪青松 王正华[2] 杨富翔 李洁[3] LIN Yongzhen;XU Chuanfu;QIU Haozhong;WANG Qingsong;WANG Zhenghua;YANG Fuxiang;LI Jie(Institute for Quantum Information&State Key Laboratory of High Performance Computing,National University of Defense Technology,Changsha 410000,China;College of Computer,National University of Defense Technology,Changsha 410000,China;College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410000,China;Army Military Transportation University,Bengbu,Anhui 233000,China)

机构地区:[1]国防科技大学计算机学院量子信息研究所兼高性能计算国家重点实验室,长沙410000 [2]国防科技大学计算机学院,长沙410000 [3]国防科技大学空天科学学院,长沙410000 [4]军事交通学院,安徽蚌埠233000

出  处:《计算机科学》2024年第9期31-39,共9页Computer Science

摘  要:DSMC/PIC耦合模拟是一类重要的高性能计算应用,大规模DSMC/PIC耦合模拟计算量巨大,需要实现高效并行计算。由于粒子动态注入、迁移等操作,基于MPI并行的DSMC/PIC耦合模拟往往通信开销较大且难以实现负载均衡。针对自主研发的DSMC/PIC耦合模拟软件,在原有MPI并行优化版本上设计实现了高效的MPI+CUDA异构并行算法,结合GPU体系结构和DSMC/PIC计算特点,开展了GPU访存优化、GPU线程工作负载优化、CPU-GPU数据传输优化及DSMC/PIC数据冲突优化等一系列性能优化。在北京北龙超级云HPC系统的NVIDIA V100和A100 GPU上,针对数亿粒子规模的脉冲真空弧等离子体羽流应用,开展了大规模DSMC/PIC耦合异构并行模拟,相比原有纯MPI并行,GPU异构并行大幅缩短了模拟时间,两块GPU卡较192核的CPU加速比达到550%,同时具有更好的强可扩展性。DSMC/PIC coupled simulation is an important high-performance computing application that demands efficient parallel computing for large-scale simulations.Due to the dynamic injection and migration of particles,DSMC/PIC coupled simulations based on MPI parallelism often suffer from large communication overheads and are difficult to achieve load balancing.To address these issues,we design and implement efficient MPI+CUDA heterogeneous parallel algorithm based on the self-developed DSMC/PIC simulation software.Combining the characteristics of the GPU architecture and the DSMC/PIC computation,we conduct a series of performance optimizations,including GPU memory access optimization,GPU thread workload optimization,CPU-GPU data transmission optimization,and DSMC/PIC data conflict optimization.We perform large-scale DSMC/PIC coupled he-terogeneous parallel simulations on NVIDIA V100 and A100 GPUs in the Beijing Beilong Super Cloud HPC system for the pulsed vacuum arc plasma jet application with billions of particles.Compared to the original pure MPI parallelism,the GPU heterogeneous parallelism significantly reduce simulation time,with a speedup of 550%on two GPU cards compared to 192 cores of the CPU,while maintaining better strong scalability.

关 键 词:DSMC/PIC耦合 粒子模拟 异构并行 MPI+CUDA 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象