“神威·太湖之光”上Tend_lin应用的并行优化研究  被引量:2

Parallel optimization of Tend_lin application on the Sunway TaihuLight supercomputer

在线阅读下载全文

作  者:姜尚志 唐生林[2] 高希然 花嵘 陈莉[2] 刘颖[2] JIANG Shang-zhi;TANG Sheng-lin;GAO Xi-ran;HUA Rong;CHEN Li;LIU Ying(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]山东科技大学计算机科学与工程学院,山东青岛266590 [2]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190

出  处:《计算机工程与科学》2020年第10期1842-1851,共10页Computer Engineering & Science

基  金:国家重点研发计划(2016YFB0200803);国家自然科学基金(61521092)。

摘  要:大气环流模式是研究全球气候变化及其成因的主要工具之一,在大规模异构众核的并行计算系统上高效地并行运行复杂的大气环流模式是一个具有挑战性的课题。Tend_lin是中国科学院大气物理研究所研发的第4代大气环流模式IAP AGCM-4中动力框架的热点过程,具有计算/通信比低的特点。面向国产大规模异构众核超算平台“神威·太湖之光”,用OpenACC和AceMesh 2种不同的并行编程接口对Tend_lin进行优化。重点介绍了如何用数据驱动的任务并行编程接口AceMesh对其进行加速,介绍了计算循环和通信代码的任务并行方法,讨论了如何放松通信资源共享,对比了单层任务图和嵌套任务图下的任务映射等优化问题。测试结果表明,相比OpenACC,AceMesh在16~1024进程的不同并行配置下获得了平均2倍左右的性能提升,最后详细分析了性能收益的来源。Numerical simulation of the global atmospheric circulation is one of the main tools to understand the formation and dynamic behaviors of global climate,and it is also a great challenge to port and optimize such a complex application onto large scale heterogeneous platforms.Tend_lin is the hot spot of the dynamic core of IAP AGCM-4(the 4th generation of IAP atmospheric general circulation model),and it has a low compute-to-communication ratio.The paper ports Tend_lin to SunWay Taihulight(a large scale heterogeneous computing platform)using two different parallel application programming interfaces.The paper introduces how to parallelize the program using a data-driven parallel application programming interface AceMesh,the task parallelization method of computation loops and MPI communication,how to relax the sharing of the communication resources,and the task mapping diffe-rences between a single-level task graph and a nested task graph.The experimental results show that AceMesh can attain more than 2 times speedups compared with the OpenACC version when using 16 to 1024 processes.The paper analyzes and explains the reasons of the performance improvement.

关 键 词:大气环流模式 高分辨率 数据驱动的任务并行语言 OpenACC MPI 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象