Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units 被引量：6

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

作　　者：XIONG QinGang LI Bo XU Ji FANG XiaoJian WANG XiaoWei WANG LiMin HE XianFeng GE Wei

机构地区：[1]State Key Laboratory of Multiphase Complex Systems,Institute of Process Engineering,Chinese Academy of Sciences,Beijing 100190,China [2]Graduate University of Chinese Academy of Sciences,Beijing 100049,China

出　　处：《Chinese Science Bulletin》2012年第7期707-715,共9页

基　　金：supported by the National Natural Science Foundation of China (20221603 and 20906091)

摘　　要：Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.Many-core processors, such as graphic processing units （GPUs）, are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method （LBM）. Although tremendous speedup has been obtained on a single GPU compared with main- stream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algo- rithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the one- and two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

关键词：格子BOLTZMANN方法图形处理单元并行算法集群 COUETTE流 LBM模拟 OPENMP 直接数值模拟

分类号：O35[理学—流体力学] TP334.7[理学—力学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索