GROMACS 2020在ROCm平台上的移植与优化  被引量:4

Porting and optimization of GROMACS 2020 on ROCm platform

在线阅读下载全文

作  者:张驭洲 曹武迪 卜景德 谭光明[2] 吉青 ZHANG Yu-zhou;CAO Wu-di;BU Jing-de;TAN Guang-ming;JI Qing(Joint Laboratory of Advanced Computing for Theoretical Physics,Institute of Theoretical Physics,Chinese Academy of Sciences,Beijing 100190;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]中国科学院理论物理研究所理论物理先进计算联合实验室,北京100190 [2]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190

出  处:《计算机工程与科学》2021年第11期1901-1909,共9页Computer Engineering & Science

基  金:国家重点研发计划(2018YFB0204400)。

摘  要:GROMACS是应用广泛的开源分子动力学模拟软件,当前主要通过CUDA使用NVIDIA GPU进行加速计算。ROCm是一个开源的高性能异构计算平台。基于ROCm平台的HIP编程语言,首次实现了GROMACS 2020系列在ROCm平台上的完整移植。在MI50 GPU上,以一个复杂离子液体模拟算例为目标,使用GPU性能分析工具rocprof对移植代码进行了性能分析。针对MI50硬件特性,先后对成键力核函数、静电力的PME核函数和短程非成键力核函数进行了优化,优化后运行目标算例的性能相比初始版本整体上获得了约2.8倍的加速比,在MI50上的性能高于GROMACS原版OpenCL代码60.5%,相对纯CPU版本有约2.7倍的加速比。在另外2个具有代表性算例的单结点测试以及离子液体算例的多结点扩展性测试中,优化后的代码也达到了较好的性能提升,这表明所采用的优化操作具有一定的通用性。GROMACS is a widely used open-source molecular dynamics simulation software.Currently,NVIDIA GPUs are mainly used for accelerated calculations through CUDA.ROCm is an open-source high-performance heterogeneous computing platform.Based on the HIP programming language of the ROCm platform,this paper implements the complete porting of the GROMACS 2020 series on the ROCm platform for the first time.On MI50 GPU,with a complex ionic liquid simulation example as the target,the performance analysis of the transplanted code was carried out using GPU performance analysis tool rocprof.According to the hardware characteristics of MI50,the bonding force kernel function,the PME kernel function of electrostatic force and the short-range non-bonding force kernel function are optimized successively.After optimization,the performance of the target calculation example is about 2.8 times that of the initial version.The performance on MI50 is 60.5%higher than that of the GROMACS original OpenCL code,which is about 2.7 times faster than the pure CPU version.In the single-node test of the other two representative examples and the multi-node scalability test of the ionic liquid example,the optimized code also achieves a better performance improvement,which shows that the optimization has a certain versatility.

关 键 词:分子动力学 GROMACS ROCm 应用移植 性能优化 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象