面向OpenCL架构的GPGPU量化性能模型  被引量:3

Quantitative GPGPU Performance Model Targeting OpenCL Architecture

在线阅读下载全文

作  者:朱俊峰[1] 陈钢[2] 张珂良[1] 吴百锋[1] 

机构地区:[1]复旦大学计算机科学与技术学院,上海200433 [2]中国电子科技集团公司第三十八研究所,合肥230088

出  处:《小型微型计算机系统》2013年第5期1118-1125,共8页Journal of Chinese Computer Systems

基  金:上海市重点学科建设基金项目(B114)资助;AMD大学合作计划基金项目资助

摘  要:为了评估数据并行(DLP)应用并行化后在GPU体系结构上的执行性能,针对OpenCL架构提出一种GPGPU量化性能模型.该模型充分考虑了影响GPGPU程序性能的各种因素:全局存储器访问、局部存储器访问、计算与访存重叠、条件分支转移和同步.通过对DLP应用的静态分析并设定具体的OpenCL执行配置,在无需编写实际GPGPU程序的前提下采用该模型即可估算出DLP应用在GPU体系结构上的执行时间.在AMD RadeonTMHD 5870 GPU和NVIDIA GeForceTMGTX 280 GPU上对矩阵乘法与并行前缀和的分析与实验结果表明:该性能模型能够相对准确地评估DLP应用并行化后的执行时间.For the sake of evaluating the potential execution performance of a data-level parallel application that will be parallelized onto GPU architecture, a quantitative GPGPU performance model targeting OpenCL architecture is proposed. The present model embodies various features of the GPU architecture which affect the performance of a GPGPU program such as global memory access, local memory access, overlapping memory access with useful computation, conditional branch divergence and synchronization. By statically analyzing a DLP application with considering of the specific OpenCL execution configuration, the present model can approximately estimate the execution time of a DLP application without the need of writing the actual GPGPU program. Analytical and experimental results for matrix multiplication and parallel prefix-sum on AMD RadeonTM HD 5870 GPU and NVIDIA GeForceTM GTX 280 GPU show that the present model can estimate the execution time of DLP applications relative accurately.

关 键 词:GPU GPGPU 数据并行 OPENCL 性能模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象