检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陶小涵 朱雨 庞建民[1,2] 赵捷 徐金龙[1,2] TAO Xiao-Han;ZHU Yu;PANG Jian-Min;ZHAO-Jie;XU Jin-Long(Information Engineering University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China)
机构地区:[1]信息工程大学,河南郑州450001 [2]数学工程与先进计算国家重点实验室,河南郑州450001
出 处:《软件学报》2023年第4期1570-1593,共24页Journal of Software
基 金:国家自然科学基金(61702546)。
摘 要:异构架构逐渐成为高性能计算领域的主流架构,但相较于同构多核架构,其硬件结构及存储层次更为复杂,程序编写更为困难.先进的优化编译器可以协助程序开发人员实现更为高效的代码,降低程序开发复杂度.多面体编译模型通过抽象分析将程序抽象成空间多面体表示形式,能够将多种循环变换与硬件映射相结合,并面向特定体系结构生成相应的代码.设计实现了一个面向国产申威异构架构的并行代码自动生成系统,采用“源-源”编译模式,基于多面体编译模型实现.系统针对申威异构架构特点将程序计算过程进行硬件部署,同时实现数据传输与内存空间的自动管理.实验基于Polybench测试集中线性代数相关用例进行测试.结果表明,利用代码自动生成系统生成的异构并行代码能够在申威异构平台上正确运行,并能够有效发挥申威异构平台的性能,基于申威异构平台利用64线程加速计算的平均加速比达到了539.16倍.Heterogeneous architectures are dominating the realm of high-performance computing.However,these architectures also complicate the programming issue due to its increasingly complex hardware and memory hierarchy compared to homogeneous architectures.One of the most promising solutions to this issue is making use of optimizing compilers which can help programmers develop high-performance code executable on target machines,thereby simplifying the difficulty of programming.The polyhedral model is widely studied due to its ability to generate effective code and portability to various targets,which is realized by first converting a program into its intermediate representation and then combining the compositions of loop transformations and hardware binding strategies.This paper presents a source-to-source parallel code generator targeting the domestic,heterogeneous architecture of the Sunway machine using the polyhedral model.In particular,the computation is deployed automatedly onto the Sunway architecture and memory management,minimizing the amount of data movements between the management processing element and computing processing elements of the target.The experiments are conducted on 13 linear algebra applications extracted from the Polybench Benchmarks.The experimental results show that the proposed approach can generate effective code executable on the Sunway heterogeneous architecture,providing a mean speedup of 539.16×on 64 threads over the sequential implementation executed on a management processing element.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.141.193