检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘鑫 郭恒 孙茹君 陈左宁 LIU Xin;GUO Heng;SUN Ru-Jun;CHEN Zuo-Ning(National Research Centre of Parallel Computer Engineering and Technology,Wuxi,Jiangsu 214083;State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi,Jiangsu 214125)
机构地区:[1]国家并行计算机工程技术研究中心,江苏无锡214083 [2]数学工程与先进计算国家重点实验室,江苏无锡214125
出 处:《计算机学报》2018年第10期2209-2220,共12页Chinese Journal of Computers
基 金:"全球变化和应对"专项基金(2016YFA0602200);国家"九七三"重点基础研究发展规划项目基金(2014CB744100)资助
摘 要:复杂应用系统面临着全系统、全物理过程、自然尺度的计算模拟,对计算机能力提出更高要求.该文介绍了"神威·太湖之光"系统半机以上超大规模并行应用的算法特点、体系结构适应性、计算复杂度、访存复杂度和通信复杂度的大规模实验分析结果,基于大规模应用计算和数据移动特征以及异构众核体系结构特点提出新的性能模型,得出影响大规模应用性能的关键因素,提出E级复杂应用对未来E级计算机系统的设计需求.Complex application system is faced with large computing simulation of the whole system,the whole physical process,true three-dimension and natural scale,which put forward higher requirements for the supercomputer's ability.Most large-scale applications of Sunway TaihuLight supercomputer are the largest scale of the correspondent field that partly represents the characteristics of the application.This paper mainly analysis the calculation characteristics and data migration behavior of the semi-scale and full-scale applications.First,we provide a brief introduction to the Sunway TaihuLight system,the architecture of the homegrown many-core SW26010 processor and some parallel programming methods and architecture-related optimization methods to supports large-scale parallel applications development.According to classification criteria of the University of California,Berkeley,we analysis the applications of ten computing themes such as dense linear algebra,sparse linear algebra,spectral methods,N-body methods,structured grids,unstructured grids,Map-Reduce,graph traversal and dynamic programming.Focusing on the characteristics of the algorithm,the adaptability of the architecture,the algorithm complexity,the space complexity,the characteristics of memory access and the communication complexity,we get the bottlenecks of the application algorithms extended to the exascale.Based on the above analysis and the architecture characteristics of SW26010,we first propose a new performance model of one core group to help efficient algorithm designing on core group for different applications.From this model,the program designers should increase the execution efficiency of the CPU’s computing units,improve the memory access bandwidth of real applications and reduce the amount and times of communication.For large-scale parallel applications,we also give a modified performance model of large problems.For most memory intensive applications,how to reduce the amount and times of discrete memory access,improving the bandwidth of direc
关 键 词:神威·太湖之光 大规模应用 复杂度分析 计算特征
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49