面向申威异构架构的并行代码自动生成  被引量:4

Parallel Code Generation for Sunway Heterogeneous Architecture

在线阅读下载全文

作  者:陶小涵 朱雨 庞建民[1,2] 赵捷 徐金龙[1,2] TAO Xiao-Han;ZHU Yu;PANG Jian-Min;ZHAO-Jie;XU Jin-Long(Information Engineering University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China)

机构地区:[1]信息工程大学,河南郑州450001 [2]数学工程与先进计算国家重点实验室,河南郑州450001

出  处:《软件学报》2023年第4期1570-1593,共24页Journal of Software

基  金:国家自然科学基金(61702546)。

摘  要:异构架构逐渐成为高性能计算领域的主流架构,但相较于同构多核架构,其硬件结构及存储层次更为复杂,程序编写更为困难.先进的优化编译器可以协助程序开发人员实现更为高效的代码,降低程序开发复杂度.多面体编译模型通过抽象分析将程序抽象成空间多面体表示形式,能够将多种循环变换与硬件映射相结合,并面向特定体系结构生成相应的代码.设计实现了一个面向国产申威异构架构的并行代码自动生成系统,采用“源-源”编译模式,基于多面体编译模型实现.系统针对申威异构架构特点将程序计算过程进行硬件部署,同时实现数据传输与内存空间的自动管理.实验基于Polybench测试集中线性代数相关用例进行测试.结果表明,利用代码自动生成系统生成的异构并行代码能够在申威异构平台上正确运行,并能够有效发挥申威异构平台的性能,基于申威异构平台利用64线程加速计算的平均加速比达到了539.16倍.Heterogeneous architectures are dominating the realm of high-performance computing.However,these architectures also complicate the programming issue due to its increasingly complex hardware and memory hierarchy compared to homogeneous architectures.One of the most promising solutions to this issue is making use of optimizing compilers which can help programmers develop high-performance code executable on target machines,thereby simplifying the difficulty of programming.The polyhedral model is widely studied due to its ability to generate effective code and portability to various targets,which is realized by first converting a program into its intermediate representation and then combining the compositions of loop transformations and hardware binding strategies.This paper presents a source-to-source parallel code generator targeting the domestic,heterogeneous architecture of the Sunway machine using the polyhedral model.In particular,the computation is deployed automatedly onto the Sunway architecture and memory management,minimizing the amount of data movements between the management processing element and computing processing elements of the target.The experiments are conducted on 13 linear algebra applications extracted from the Polybench Benchmarks.The experimental results show that the proposed approach can generate effective code executable on the Sunway heterogeneous architecture,providing a mean speedup of 539.16×on 64 threads over the sequential implementation executed on a management processing element.

关 键 词:申威异构架构 多面体模型 并行计算 代码生成 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象