检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙兆鹏 周宽久[1] SUN Zhao-peng;ZHOU Kuan-jiu(College of Software,Dalian University of Technology,Dalian 116620,China)
出 处:《计算机工程与科学》2021年第4期641-651,共11页Computer Engineering & Science
基 金:中央高校基本科研业务费专项资金(DUT19ZD104)。
摘 要:异构计算作为一种特殊的并行计算方式,能根据计算任务的特点发挥不同计算资源的能力,在提高服务器计算性能、能效比和实时性方面有极大优势,但目前异构计算环境存在编程复杂、可信性无法保证的问题。针对以上问题,提出了一个基于状态变迁矩阵(STM)的编程框架,可以集成GPU和FPGA的资源。通过状态迁移矩阵对CUDA和Vivado的应用程序接口(API)进行集成,自动生成异构计算所需要的标准C代码。通过PCIe总线连接GPU和FPGA设备,从而可以在这些异构计算单元之间进行数据传输,中间无需使用系统CPU内存。并且通过GPUDirect RDMA实现了FPGA作为主控器的PCIe通信,突破了GPU作为主控器的PCIe通信当中读取操作的短板。实验表明,相比共享内存的通信方式,FPGA作为主控器的PCIe通信方式的通信效率提高了1.4倍,实现的数据速率接近理论带宽的最大值。As a special parallel computing method,heterogeneous computing can make full use of the capabilities of different computing units according to the characteristics of computing tasks.It has great advantages in improving the computing performance,real-time performance and reducing the energy consumption of the processor.However,at present,there are some problems in heterogeneous computing environment,such as complex programming and unreliability.To solve these problems,this paper proposes a programming framework based on state transition matrix(STM),which can integrate GPU and FPGA resources.Application programming interfaces(APIs)of CUDA and Vivado are integrated through STM,and the standard C code for heterogeneous computing is automatically generated.By connecting GPU and FPGA devices through PCI Express bus,data can be transferred between these heterogeneous computing units without intermediate use of system CPU memory.Besides,GPUDirect RDMA is used to realize the PCIe communication with FPGA as the main controller,which breaks through the short board of read operation in the PCIe communication with GPU as the main controller.Experimental results show that the communication efficiency is 1.9 times higher than that of shared memory,and the realized data rate is close to the maximum of theoretical bandwidth.
关 键 词:状态变迁矩阵 异构计算 FPGA GPU PCIE
分 类 号:TP303[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.142.253