检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:林琳 祝爱琦 赵明璨 张帅 叶炎昊 徐骥[2] 韩林 赵荣彩 侯超峰[2] LIN Lin;ZHU Aiqi;ZHAO Mingcan;ZHANG Shuai;YE Yanhao;XU Ji;HAN Lin;ZHAO Rongcai;HOU Chaofeng(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China;Institute of Process Engineering,Chinese Academy of Sciences,Beijing 100190,China;National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450001,China)
机构地区:[1]郑州大学信息工程学院,郑州450001 [2]中国科学院过程工程研究所,北京100190 [3]郑州大学国家超级计算郑州中心,郑州450001
出 处:《计算机工程》2023年第4期166-173,共8页Computer Engineering
基 金:国家自然科学基金(21776280,22073103);北京市自然科学基金(JQ21034);河南省重大科技专项(201400211300)。
摘 要:分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题,结合Nvidia Tesla V100 GPU硬件体系结构特点,对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法,利用GPU强大并行计算和浮点运算能力,减少显存访问及算法执行过程中的分支冲突和判断指令,提升算法整体计算性能。测试结果表明,优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍,相比于国际上主流的GPU加速MD模拟软件HOOMDblue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍,具有较好的模拟加速效果。Molecular Dynamics(MD)is one of the main methods used to study the thermodynamic properties of silicon nano-films;however,these studies have problems such as processing massive amounts of data,computational intensity,and complex interatomic interaction,which limit the comprehensive application of MD simulations.To address discontinuities in data access and branch judgments causing the parallel waste of resources and thread waiting in the crystalline silicon MD simulation algorithm,this study combines the characteristics of Nvidia Tesla V100 Graphics Processor Unit(GPU)hardware architecture to design the crystalline silicon MD simulation algorithm.Global memory optimization methods such as coalesced access,loop unrolling,and atomic operation are designed for the MD simulation algorithm.The combination of optimization design and a GPU with powerful parallel and floating-point computing capabilities reduces branch conflicts and judgment instructions during memory access and algorithm execution and improves the overall computing performance of the algorithm.The test results show that the optimized crystal silicon MD simulation algorithm is 1.69-1.97 times faster than the unoptimized algorithm.The optimized algorithm performs 3.20-3.47 and 17.40-38.04 times better than the GPU-accelerated MD simulation software HOOMD-blue and LAMMPS,respectively.The simulations achieve good computation performance.
关 键 词:分子动力学 图形处理器 合并访存 循环展开 原子操作 性能优化
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7