异构平台上基于OpenCL的矩阵乘并行算法  被引量:3

AMatrix Multiplication Parallel Algorithm Based on OpenCL on Heterogeneous Platforms

在线阅读下载全文

作  者:肖汉 肖诗洋[2] 李彩林 周清雷[4] XIAOHan;XIAO Shi-yang;LI Cai-lin;ZHOU Qing-lei(School of Information Science and Technology,Zhengzhou Normal University,Zhengzhou 450044,China;School of Civil Engineering,Northeast Forestry University,Harbin 150040,China;School of Civil and Architectural Engineering,Shandong University of Technology,Zibo,Shandong 255000,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)

机构地区:[1]郑州师范学院信息科学与技术学院,郑州450044 [2]东北林业大学土木工程学院,哈尔滨150040 [3]山东理工大学建筑工程学院,山东淄博255000 [4]郑州大学信息工程学院,郑州450001

出  处:《西南大学学报(自然科学版)》2020年第11期147-153,共7页Journal of Southwest University(Natural Science Edition)

基  金:国家自然科学基金项目(41601496,41701525,61572444);山东省自然科学基金项目(ZR2017LD002);山东省重点研发计划项目(2018GGX106002).

摘  要:在分析开放式计算语言(OpenCL)平台底层硬件构架的基础上,从数据本地化、计算资源利用率和访存带宽利用率等多个不同角度优化了矩阵乘算法,并实现了矩阵乘算法在OpenCL架构下的加速.实验数据显示,与基于CPU的单线程算法、基于OpenMP多线程算法和基于统一计算设备架构(CUDA)并行算法相比,基于OpenCL架构的矩阵乘并行算法效率更高.Based on an analysis of the underlying hardware architecture of Open Computing Language(OpenCL)platform,this paper optimizes the matrix multiplication algorithm from several different angles,such as the data localization,the computing resource utilization ratio and the utilization ratio of the memory bandwidth,and realizes the acceleration of matrix multiplication algorithm in OpenCL architecture.The experimental data show that the matrix multiplication parallel algorithm based on OpenCL architecture is more efficient than the single thread algorithm based on CPU,the multi-thread algorithm based on Open Multi-Processing(OpenMP)and theparallel algorithm based on Compute Unified Device Architecture(CUDA).

关 键 词:矩阵乘 图形处理器 开放式计算语言 并行算法 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象