机构地区:[1]西安交通大学计算机科学与技术学院,西安710049
出 处:《计算机学报》2020年第6期990-1009,共20页Chinese Journal of Computers
基 金:国家自然科学基金(61572394);国家重点研发计划(2017YFB0202002)资助.
摘 要:主流异构并行编程方法如CUDA和OpenCL,其编程抽象层次低,编程接口靠近底层,无法为用户屏蔽底层硬件和运行时细节,导致编程逻辑复杂,编程困难易错.同时应用性能绑定于底层运行时环境,在硬件架构变化时需要根据硬件特征进行针对性改动和优化,无法保证上层应用的统一.为了简化异构并行编程,提高编程效率,实现上层应用的统一和跨平台,本文提出了一种面向异构众核系统的高层统一并行编程架构UPPA(Unified Parallel Programming Architecture).架构中首先提出了数据关联计算编程模型,实现了不同层级不同模式并行性的统一描述,简化了异构并行编程逻辑,提供了高层统一的并行编程抽象;继而设计了数据关联计算描述语言为用户提供简便易用的统一编程接口,通过高层语义结构保留了应用的并行特征,可以指导编译和运行时系统实现向不同硬件架构的自动映射,保证了上层应用的统一,并采用C语言兼容的语法提供针对高层语义结构的语言扩展,保证编程接口的易学易用;最后提供了基于OpenCL的编译和运行时原型系统,以OpenCL为中间语言实现了高层应用在不同异构系统上的执行,提供了良好的跨平台特性.我们使用数据关联计算描述语言对Parboil和Rodinia测试集中的多个测试用例进行了重构,并在NVIDIA GPU和Intel MIC两种异构平台上进行了验证测试.每个测试用例重构的代码量与测试集提供的串行代码相当,仅为测试集OpenCL代码的13%~64%,有效地降低了异构编程的工作量.在编译和运行时系统的支持下,重构代码无需改动就可以在两种平台上执行.相比于人工编写且经过优化的测试集OpenCL代码,重构代码在GPU和MIC两种平台下分别能够达到其性能的91%~100%和76%~98%,这表明了本文方法的有效性和编译与运行时系统的高效.Mainstream heterogeneous parallel programming methods such as CUDA and OpenCL provide close-to-mental programming interface and present low-level programming abstraction and simplifies the heterogeneous parallel programming logic.Secondly,the UPPA provides a unified programming interface for the developers with the DAC description language.The DAC description language implements the DAC model with language extensions.High-level semantic structures are designed to preserve the parallel features of the application and guide the compilation and runtime system to conduct automatic mapping of high-level applications onto different hardware architectures,saving programming effort while keeping high-level applications unified.What is more,the DAC description language adopts C-like syntax for the language extensions that implements these high-level semantic structures,ensuring the easy-to-learn and easy-to-use features of the programming interface.Finally,a prototype system which is consisting of a source-to-source compiler and runtime support is implemented on the top of OpenCL.The runtime system encapsulates OpenCL runtime APIs with runtime library functions.Based on these library functions,the source-to-source compiler generates standard OpenCL code from the application developed with the DAC description language.Using OpenCL as an intermediate language,the compiler and runtime system achieves efficient execution of high-level applications on different heterogeneous systems,providing a fine cross-platform feature.We rebuilt multiple benchmarks which are selected from the Parboil benchmark suite and the Rodinia benchmark suite with the DAC description language and conducted experimental tests on both a NVIDIA GPU and an Intel MIC platforms.The code size of each rebuilt benchmark is roughly equivalent to that of the serial code provided by the corresponding benchmark suite,which is only 13%to 64%of the original benchmark OpenCL code,reducing the workload of heterogeneous programming significantly.With the support of the
关 键 词:异构并行编程 数据关联计算 并行编程模型 统一编程架构 OPENCL
分 类 号:TP312[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...