swLLVM:面向神威新一代超级计算机的优化编译器  被引量:1

swLLVM:Optimized Compiler for New Generation Sunway Supercomputer

在线阅读下载全文

作  者:沈莉 周文浩 王飞[4] 肖谦 武文浩 张鲁飞 安虹[1] 漆锋滨[3] SHEN Li;ZHOU Wen-Hao;WANG Fei;XIAO Qian;WU Wen-Hao;ZHANG Lu-Fei;AN Hong;QI Feng-Bin(University of Science and Technology of China,Hefei 230026,China;National Research Center of Parallel Computer Engineering and Technology,Beijing 100190,China;Jiangnan Institute of Computing Technology,Wuxi 214083,China;Tsinghua University,Beijing 100084,China)

机构地区:[1]中国科学技术大学,安徽合肥230026 [2]国家并行计算机工程技术研究中心,北京100190 [3]江南计算技术研究所,江苏无锡214083 [4]清华大学,北京100084

出  处:《软件学报》2024年第5期2359-2378,共20页Journal of Software

基  金:国家重点研发计划(2018YFB0204200);浙江省科技厅重大项目(2022C01250)。

摘  要:异构众核架构具有超高的能效比,已成为超级计算机体系结构的重要发展方向.然而,异构系统的复杂性给应用开发和优化提出了更高要求,其在发展过程中面临好用性和可编程性等众多技术挑战.我国自主研制的神威新一代超级计算机采用了国产申威异构众核处理器SW26010Pro.为了发挥新一代众核处理器的性能优势,支撑新兴科学计算应用的开发和优化,设计并实现面向SW26010Pro平台的优化编译器swLLVM.该编译器支持Athread和SDAA双模态异构编程模型,提供多级存储层次描述及向量操作扩展,并且针对SW26010Pro架构特点实现控制流向量化、基于代价的节点合并以及针对多级存储层次的编译优化.测试结果表明,所设计并实现的编译优化效果显著,其中,控制流向量化和节点合并优化的平均加速比分别为1.23和1.11,而访存相关优化最高可获得2.49倍的性能提升.最后,使用SPEC CPU2006标准测试集从多个维度对swLLVM进行了综合评估,相较于SWGCC的相同优化级别,swLLVM整型课题性能平均下降0.12%,浮点型课题性能平均提升9.04%,整体性能平均提升5.25%,编译速度平均提升79.1%,代码尺寸平均减少1.15%.The heterogeneous many-core architecture with an ultra-high energy efficiency ratio has become an important development trend of supercomputer architecture.However,the complexity of heterogeneous systems puts forward higher requirements for application development and optimization,and they face many technical challenges such as usability and programmability in the development process.The independently developed new-generation Sunway supercomputer is equipped with a homegrown heterogeneous many-core processor,SW26010Pro.To take full advantage of the performance of the new-generation many-core processors and support the development and optimization of emerging scientific computing applications,this study designs and implements an optimized compiler swLLVM oriented to the SW26010Pro platform.The compiler supports Athread and SDAA dual-mode heterogeneous programming models and provides multi-level storage hierarchy description and SIMD extensions for vector-like operations.In addition,it realizes control-flow vectorization,cost-based node combination,and compiler optimization for multi-level storage hierarchy according to the architecture characteristics of SW26010Pro.The experimental results show that the compiler optimization designed and implemented in this paper achieves significant performance improvements.The average speedup of control-flow vectorization and node combination and optimization is 1.23 and 1.11,respectively,and the memory access optimization achieves a maximum performance improvement of 2.49 times.Finally,a comprehensive evaluation of swLLVM is performed from multiple dimensions on the standard test set SPEC CPU2006.The results show that swLLVM reports an average increase of 9.04%in the performance of floating-point projects,5.25%in overall performance,and 79.1%in compilation speed and an average decline of 0.12%in the performance of integer projects and 1.15%in the code size compared to SWGCC with the same optimization level.

关 键 词:异构众核 编译系统 编程模型 存储层次 向量化 节点合并 访存优化 

分 类 号:TP314[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象