检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王一超[1] 秦强[1] 施忠伟[1] 林新华[1]
机构地区:[1]上海交通大学,上海200240
出 处:《计算机科学》2015年第1期75-78,共4页Computer Science
摘 要:OpenACC是一套基于指导语句方式的并行编程语言标准。编程者可以通过在代码中添加符合该标准的指导语句,经OpenACC编译器的编译,将串行代码并行化地移植到加速器或者协处理器上,进而获得异构加速器所带来的加速效果。OpenACC与CUDA和OpenCL这类异构并行编程技术的不同之处在于,它的目的是使编程者在应用移植过程中不需要考虑加速器或协处理器的底层硬件架构,从而降低编程难度。同时它也具有仅需维护一套代码便可在不同硬件平台上运行的优良跨平台性。因此,OpenACC是一个值得研究的并行编程标准。如今的异构加速硬件设备呈现出多元化趋势。在2013年11月的Top500榜单上排名第一的"天河二号"使用了48000块构建在Intel Knights Corner架构之上的协处理器。与此同时,发布不久的NVIDIA公司最新的Kepler架构GPU产品由于多年来的GPU市场积累也迅速形成了可观的用户群体。对于并非追求性能极限的应用移植者而言,寻求应用性能和移植简易性之间的平衡是相当重要的议题。只需要编写一套代码便可运行在这两种硬件平台上的OpenACC正迎合了用户在移植简易性上的需求。解决了移植的简易性之后,同一个应用在不同硬件平台上的性能表现便成了用户最想了解的问题。通过实验和构建性能模型向读者展示使用OpenACC移植的应用在Intel Knights Corner和NVIDIA Kepler架构硬件上的性能可移植性。OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives.Since OpenACC can generate OpenCL and CUDA code,meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler,it is attractive to using OpenACC on hardwares with different underlying microarchitectures.This paper studied how realistic it is to use a single OpenACC source code for a set of hardwares with different underlying micro-architectures.Intel Knight Corner and Nvidia Kepler products are the targets in the experiment,since they have the latest architectures and similar peak performance.Meanwhile CAPS OpenACC compiler is used to compile EPCC OpenACC benchmark suite,Stream and MaxFlops of SHOC benchmarks to access the performance.To study the performance portability,roofline model and relative performance model were built by the data of ex periments.It shows that at most 82% performance compared with peak performance on Kepler and Knight Corner is achieved by specific benchmarks,but as the rise of arithmetic intensity,the average performance is approximately 10%.And there is a big performance gap between Intel Knight Corner and Nvidia Kepler on several benchmarks.This study confirmed that performance portability of OpenACC is related to the arithmetic intensity and a big performance gap still exsits in specific benchmarks between different hardware platforms.
分 类 号:TP338.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.22.41.47