Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units  被引量:6

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

在线阅读下载全文

作  者:XIONG QinGang LI Bo XU Ji FANG XiaoJian WANG XiaoWei WANG LiMin HE XianFeng GE Wei 

机构地区:[1]State Key Laboratory of Multiphase Complex Systems,Institute of Process Engineering,Chinese Academy of Sciences,Beijing 100190,China [2]Graduate University of Chinese Academy of Sciences,Beijing 100049,China

出  处:《Chinese Science Bulletin》2012年第7期707-715,共9页

基  金:supported by the National Natural Science Foundation of China (20221603 and 20906091)

摘  要:Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with main- stream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algo- rithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the one- and two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

关 键 词:格子BOLTZMANN方法 图形处理单元 并行算法 集群 COUETTE流 LBM模拟 OPENMP 直接数值模拟 

分 类 号:O35[理学—流体力学] TP334.7[理学—力学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象