面向神威·太湖之光的国产异构众核处理器OpenCL编译系统  被引量:8

An OpenCL Compiler for the Homegrown Heterogeneous Many-Core Processor on the Sunway TaihuLight Supercomputer

在线阅读下载全文

作  者:伍明川 黄磊[1] 刘颖[1] 何先波[3] 冯晓兵[1] WU Ming-Chuan;HUANG Lei;LIU Ying;HE Xian-Bo;FENG Xiao-Bing(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Science,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;Computer School,China West Normal University,Nanchong,Sichuan 637009)

机构地区:[1]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190 [2]中国科学院大学,北京100049 [3]西华师范大学计算机学院,四川南充637009

出  处:《计算机学报》2018年第10期2236-2250,共15页Chinese Journal of Computers

基  金:国家重点研发计划项目高性能计算项目(2016YFB0200800);国家自然科学基金重点项目(61432018);创新研究群体项目(61521092);南充市科技支撑项目(15A0068);西华师范大学培育项目(13C002);西华师范大学英才(17YC149)资助~~

摘  要:近年来硬件设计呈现出异构化的趋势,如何有效开发并行程序成为制约异构系统发展的瓶颈之一已成为业界共识.我国自主研制的"神威·太湖之光"超级计算机,采用了国产片上异构众核处理器SW26010,为了降低程序员的编程难度,同时提高软件的移植效率,作者设计并实现了支持国产SW26010众核处理器的OpenCL编译系统.该编译系统实现了OpenCL平台模型、内存模型和执行模型到SW26010众核处理器的映射与优化机制,同时生成性能良好的可执行文件.最后通过实验验证了该编译系统的正确性和有效性,典型OpenCL应用经该编译系统编译后,在中小输入规模下,性能显著优于Intel Xeon Phi,与NVIDIA GPU可比;在较大输入规模下,受限于局存SPM的容量限制,性能略低于NVIDIA GPU.In recent years,with the tremendous development of the integrated circuit technology,it is possible to integrate multiple processor cores on a single chip to accomplish more complex and large computational tasks,and the processor architecture has evolved from single-core to multi-core and many-core.However,there is also a bottleneck in improving performance by means of blindly increasing the cores of same type processors.To further enhance the computing power,there has been a trend towards heterogeneous system architecture,which can provide more powerful computing power and better performance-to-power ratio.It has become the industry consensus that the programming model is one of the bottlenecks restricting the development of heterogeneous systems.The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops,equipped with a homegrown heterogeneous many-core SW26010 CPU that includes both the management processing elements and computing processing elements in one chip.With 260 processing elements in one processor,a single SW26010 provides a peak performance of over 3 TFlops.On the other hand,large-scale scientific and engineering calculations such as earth,ocean,atmospheric system modeling and other critical applications are facing with the big performance challenge.How to fully utilize the computing power of the homegrown heterogeneous platform to achieve high performance of critical applications has important academic and practical value.In order to reduce the difficulty of programming,while improving software portability,we design and implement an OpenCL Compiler for the SW26010 processor.Based on the OpenCL programming framework and the microarchitecture of the homegrown many-core processors,the compiler provides the mapping mechanism from OpenCL platform,memory and execution model to the SW26010 many-core processor and implements thread coarsening,data layout and vectorization optimizations for the homegrown many-core processor.This paper implements the sour

关 键 词:OPENCL 异构 国产众核处理器 编译系统 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象