检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:肖汉 周清雷[2] 姚鹏姿[1] XIAO Han;ZHOU Qing-lei;YAO Peng-zi(School of Information Science and Technology,Zhengzhou Normal University,Zhengzhou 450044,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
机构地区:[1]郑州师范学院信息科学与技术学院,郑州450044 [2]郑州大学信息工程学院,郑州450001
出 处:《小型微型计算机系统》2019年第1期26-30,共5页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61572444;61250007)资助
摘 要:矩阵-向量乘法算法的时间复杂度大,传统计算方法的实时性和跨平台性难以保证.本文提出一种基于开放式计算语言(Open Computing Language,OpenCL)的矩阵-向量乘并行算法,矩阵-向量乘法过程被分解成若干具有不同粒度的子任务.根据相应的并行度,每个工作组进行矩阵中的行块与列向量的乘积,每个工作项进行行块中行向量与列向量的乘积,并把计算任务分别分配到计算单元和处理单元进行处理.实验结果表明,与基于CPU的串行算法、基于OpenMP并行算法和基于统一计算设备架构(Compute Unified Device Architecture,CUDA)并行算法性能相比,矩阵-向量乘并行算法在OpenCL架构下NVIDIA图形处理器(Graphic Processing Unit,GPU)计算平台上分别获得了20. 86倍、6. 39倍和1. 49倍的加速比.验证了提出的并行优化方法的有效性和性能可移植性.The time complexity of matrix-vector multiplication algorithm is large,and the real-time and cross-platform performance of traditional computing methods is difficult to guarantee. This paper presents a matrix-vector multiplication parallel algorithm based on Open Computing Language( OpenCL),and the matrix-vector multiplication process is decomposed into several subtasks with different granularity. According to the corresponding degree of parallelism,each work-group carries on the product of the rowblock in the matrix and the column vector,each work-item carries on the product of the rowvector in the rowblock and the column vector,and assigns the computation task separately to the compute unit and the processing element for processing. The experimental results showthat compared with the performance of the serial algorithm based on CPU,parallel algorithm based on OpenMP and parallel algorithm based on Compute Unified Device Architecture( CUDA),the matrix-vector multiplication parallel algorithm obtains 20. 86 times,6. 39 times and 1. 49 times speedup in the NVIDIA GPU computing platform under the OpenCL architecture respectively. The validity and performance portability of the proposed parallel optimization method are verified.
关 键 词:矩阵-向量乘 图形处理器 开放式计算语言 并行算法
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.32