检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王鑫[1] 李嘉楠 韩林 赵荣彩 周强伟 WANG Xin;LI Jianan;HAN Lin;ZHAO Rongcai;ZHOU Qiangwei(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China;National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450001,China)
机构地区:[1]郑州大学计算机与人工智能学院,郑州450001 [2]国家超级计算郑州中心(郑州大学),郑州450001
出 处:《计算机工程与应用》2023年第10期75-85,共11页Computer Engineering and Applications
基 金:2022年度河南省重大科技专项(221100210600)。
摘 要:国产异构处理器DCU(deep computing unit)上的本地数据共享(local data share,LDS)是一种低延迟、高带宽的显式寻址内存。国产异构系统的OpenMP未提供LDS访问的编程接口,导致未有效地利用LDS硬件实现数据的高效访存。针对此问题,研究了面向DCU平台的OpenMP Offload执行模式和LDS的分配方法,以及特定于LDS访存的指令结构,实现了LDS访存的手动支持。另外针对于OpenMP Offload的不同执行模式,在此优化方法的基础上实现了LDS访存的自动化,形成了一套面向国产异构平台的高效访存策略。实验采用polybench标准测试集进行测试,利用手动和自动优化方法在单线程模式下平均加速比可达2.60,利用手动优化方法在多线程non-SPMD模式下平均加速比达1.38,利用自动优化方法在多线程SPMD模式下平均加速比达1.11。实验结果表明LDS访存的自动和手动支持有助于提高OpenMP异构程序运行速度。The local data share(LDS)on the heterogeneous processor DCU(deep computing unit)is an explicit addressable memory with low latency and high bandwidth.OpenMP of heterogeneous systems made in China does not provide the programming interface for LDS access,which leads to the ineffective use of LDS hardware to achieve efficient data access and storage.Aiming at this problem,the execution mode of OpenMP Offload for DCU platform,the allocation method of LDS and the instruction structure specific to LDS memory access are studied,and the manual support of LDS memory access is realized.In addition,aiming at the different execution modes of OpenMP Offload,the automation of LDS memory access is realized on the basis of this optimization method,and a set of efficient memory access strategies for domestic heterogeneous platforms is formed.The experiment is tested by using the standard test set of polybench.The average speedup of manual and automatic optimization methods is 2.60 in single-threaded mode,1.38 in multi-threaded non-SPMD mode by manual optimization method and 1.11 in multi-threaded SPMD mode by automatic optimization method.The experimental results show that the automatic and manual support of LDS memory access is helpful to improve the running speed of OpenMP heterogeneous programs.
关 键 词:国产处理器DCU 本地数据共享(LDS) OpenMP Offlaod SPMD non-SPMD
分 类 号:TP332[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.173