检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈锐 孙羽菲 郭强 隋轶丞 周振辉 石昌青 张玉志 CHEN Rui;SUN Yufei;GUO Qiang;SUI Yicheng;ZHOU Zhenhui;SHI Changqing;ZHANG Yuzhi(College of Software,Nankai University,Tianjin 300457,China;Haihe Lab of ITAI,Tianjin 300459,China)
机构地区:[1]南开大学软件学院,天津300457 [2]先进计算与关键软件海河实验室,天津300459
出 处:《计算机工程》2023年第4期138-148,共11页Computer Engineering
基 金:国家重点研发计划(2021YFB0300104)。
摘 要:深度学习模型的构建、训练以及推理离不开TensorFlow等机器学习框架中深度学习算子的支撑,对于卷积、池化等深度学习中被高频调用或计算量较大的算子,机器学习框架一般通过调用深度神经网络(DNN)库来提升计算效能。现有DNN库主要由英伟达、AMD等少数国外厂商开发并根据自有硬件设备特点进行优化,但其封闭性导致其他厂商生产的通用加速器难以在深度学习领域发挥作用。为解决现有DNN库无法支持国产加速器的问题,使得深度学习模型能够调用国产加速器进行运算,研究跨平台的通用DNN库,通过对开源MIOpen的结构特点和调用方式进行分析,提出修改和重构该库的方法,并实现一种基于OpenCL的DNN(OclDNN)库。考虑到TensorFlow较高的流行度及其对DNN库调用的特殊性与复杂性,研究通用DNN库在TensorFlow中的集成方法,通过StreamExecutor中的OpenCL平台实现对OclDNN的调用。实验结果表明,OclDNN在英伟达、华为等不同厂商的计算设备上运算结果正确可靠,在相同实验环境下,深度学习算子使用OclDNN时的加速性能比传统CPU并行算法提升了5~60倍。In machine learning frameworks such as TensorFlow,the construction,training,and reasoning of deep learning models rely on the support of deep learning operators.The efficiency of frequently used or computationally heavy deep learning operators,such as convolution and pooling,is improved in machine learning frameworks by using Deep Neural Network(DNN)libraries.The existing DNN libraries are mainly developed by a few hardware manufacturers such as Nvidia and AMD and optimized particularly for the characteristics of their own hardware equipment.Consequently,it is challenging for other hardware manufacturers to explore deep learning domain using the existing DNN libraries.To address this challenge faced by the domestic hardware accelerators and to enable deep learning models to use this hardware easily for calculations,a cross-platform generic DNN library was studied.The analysis of the structural characteristics and call methods of the open-source library MIOpen was performed.On the basis of the results,a method of modifying and reconstructing the library was developed,and OclDNN,a DNN library based on OpenCL,was implemented.Considering the high popularity of TensorFlow and the specificity and complexity of calling DNN libraries,the method of integrating generic DNN libraries in TensorFlow was studied.The call to OclDNN was implemented using the OpenCL platform in StreamExecutor.The experimental results indicate that OclDNN operates correctly and reliably on the computing devices from Nvidia,Huawei,and other hardware manufacturers.Moreover,the acceleration performance of deep learning operators using OclDNN is 5-60 times higher than that of the traditional CPU parallel algorithm.
关 键 词:深度神经网络库 深度学习 开放计算语言 硬件加速器 TensorFlow框架
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.176.160