检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:肖汉 肖诗洋[2] 李彩林 周清雷[4] XIAOHan;XIAO Shi-yang;LI Cai-lin;ZHOU Qing-lei(School of Information Science and Technology,Zhengzhou Normal University,Zhengzhou 450044,China;School of Civil Engineering,Northeast Forestry University,Harbin 150040,China;School of Civil and Architectural Engineering,Shandong University of Technology,Zibo,Shandong 255000,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
机构地区:[1]郑州师范学院信息科学与技术学院,郑州450044 [2]东北林业大学土木工程学院,哈尔滨150040 [3]山东理工大学建筑工程学院,山东淄博255000 [4]郑州大学信息工程学院,郑州450001
出 处:《西南大学学报(自然科学版)》2020年第11期147-153,共7页Journal of Southwest University(Natural Science Edition)
基 金:国家自然科学基金项目(41601496,41701525,61572444);山东省自然科学基金项目(ZR2017LD002);山东省重点研发计划项目(2018GGX106002).
摘 要:在分析开放式计算语言(OpenCL)平台底层硬件构架的基础上,从数据本地化、计算资源利用率和访存带宽利用率等多个不同角度优化了矩阵乘算法,并实现了矩阵乘算法在OpenCL架构下的加速.实验数据显示,与基于CPU的单线程算法、基于OpenMP多线程算法和基于统一计算设备架构(CUDA)并行算法相比,基于OpenCL架构的矩阵乘并行算法效率更高.Based on an analysis of the underlying hardware architecture of Open Computing Language(OpenCL)platform,this paper optimizes the matrix multiplication algorithm from several different angles,such as the data localization,the computing resource utilization ratio and the utilization ratio of the memory bandwidth,and realizes the acceleration of matrix multiplication algorithm in OpenCL architecture.The experimental data show that the matrix multiplication parallel algorithm based on OpenCL architecture is more efficient than the single thread algorithm based on CPU,the multi-thread algorithm based on Open Multi-Processing(OpenMP)and theparallel algorithm based on Compute Unified Device Architecture(CUDA).
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.244.250