基于线程池的GPU任务并行计算模式研究被引量：22

GPU Task Parallel Computing Paradigm Based on Thread Pool Model

作　　者：李涛董前琨[1] 张帅孔令晏康宏杨愚鲁 LI Tao;DONG Qian-Kun;ZHANG Shuai;KONG Ling-Yan;KANG Hong;YANG Yu-Lu(College of Computer and Control Engineering,Nankai University,Tianjin 300071;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100109)

机构地区：[1]南开大学计算机与控制工程学院,天津300071 [2]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100109

出　　处：《计算机学报》2018年第10期2175-2192,共18页Chinese Journal of Computers

基　　金：国家自然科学基金(61872200);天津市自然科学基金(16JCYBJC15200;17JCQNJC00300);计算机体系结构国家重点实验室开放课题(CARCH201504);天津市大数据与云计算科技重大专项(15ZXDSGX00020);高等学校博士学科点专项科研基金(20130031120029)资助

摘　　要：GPU已经成为具有高并发高内存带宽的通用协处理器,但是GPU与CPU在体系结构和编程模型上存在很大差异,导致CPU-GPU异构计算系统的编程复杂度提高,即使采用统一计算设备架构(CUDA)提供的kernel并发技术和多流技术也较难充分控制和利用GPU上的计算资源,难以有效地处理不规则的并行应用问题.为从体系结构角度探索GPU硬件支持的页锁定内存和统一虚拟地址空间等特征,该文提出了CPU辅助任务调度管理下的基于线程池技术的GPU任务并行计算模型CAGTP,实现了CPU-GPU异构计算系统上的共享内存式程序设计.提出并设计了CPU端的任务队列、计算线程块级任务调度器、任务槽和GPU端的任务复用kernel函数等机制,实现了CPU与GPU间的高效细粒度任务交互,避免了原生CUDA程序中多次启停kernel函数的开销,有效地支持了GPU上的细粒度不规则并行任务计算,而且利用模型API接口函数能够降低CPU-GPU异构计算系统的编程难度.实验结果表明,CAGTP模型中任务调度的开销是kernel函数调用的5%,有效提升了通用矩阵乘、乔列斯基分解和K均值、T近邻等典型线性代数和机器学习算法的计算性能;CAGTP模型易于扩展使用多块GPU,且在性能差异较大的多个GPU之间达到负载均衡,能够高效求解混合任务和具有不规则并行性的应用问题.GPUs have become general-purpose co-processors with high concurrency and memory bandwidth.However,due to the huge difference of architecture between GPU and CPU,programming on CPU-GPU heterogeneous systems has been proved difficult and time-consuming.CUDA(Compute Unified Device Architecture)is a general purpose parallel computing platform and programming model introduced by NVIDIA.It enables thousands of threads on NVIDIA’s GPU to be developed for high performance computing,and provides convenience to leverage the parallel computing ability of GPUs to some extent.Nevertheless,it is still a problem to fully utilize GPU computational resources and reasonably schedule computational tasks running on GPUs,which has become the main bottleneck to take advantage of the parallel computing ability of GPUs and apply them to accelerate practical applications,such as matrix operations from linear algebra area and machine learning algorithms.This paper proposes CAGTP(CPU-assisted GPU thread pool),a thread pool based GPU task parallel model with the assistance of CPU in task scheduling.First,we take advantage of page-locked memory and unified virtual address space,which have been supported in new generation of GPU architectures and new versions of CUDA,to improve the communication efficiency between CPU and GPU in CAGTP.Then,we design the I/O task queues,block-level task scheduler and task slots on CPUs in CAGTP,allowing users to dynamically schedule tasks that are to be calculated on GPUs.Besides,the task-multiplexed kernel is designed on GPU,which is the core of CAGTP and can achieve the dynamic scheduling of thread blocks on GPUs.Based on these mechanism,CAGTP allows efficient scheduling of fine-grained tasks,and effectively avoids the cost of launching kernels multiple times in native CUDA programs.Moreover,CAGTP supports the calculation of irregular fine-grained parallel tasks on GPUs.Last of all,we provide several Application Programming Interfaces on CAGTP model,which can effectively reduce the complexity and time-cons

关键词：异构计算系统统一计算设备架构线程池任务并行任务复用函数

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于线程池的GPU任务并行计算模式研究被引量：22

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于线程池的GPU任务并行计算模式研究 被引量：22

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于线程池的GPU任务并行计算模式研究被引量：22