面向迈创+MatrixZone异构系统的深度学习编程框架  被引量:1

A deep learning programming framework for FT-Matrix DSP+MatrixZone heterogeneous systems

在线阅读下载全文

作  者:康宇晗 时洋 陈照云 文梅[2] KANG Yu-han;SHI Yang;CHEN Zhao-yun;WEN Mei(School of Information Science and Engineering,Hunan Normal University,Changsha 410081;College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区:[1]湖南师范大学信息科学与工程学院,湖南长沙410081 [2]国防科技大学计算机学院,湖南长沙410073

出  处:《计算机工程与科学》2023年第7期1149-1158,共10页Computer Engineering & Science

基  金:国家自然科学基金(62002366)。

摘  要:为了满足深度学习模型迭代速度快、算力要求高的需求,主流硬件厂商愈发倾向于采用通用处理器+AI专用加速核的异构系统。但是,由于AI专用加速核仅支持部分核心算子,不具备通用编程能力,如何在这样的异构架构上完成深度学习任务的高效部署值得被深入研究。基于国产自研迈创+MatrixZone异构系统平台,设计并实现了深度学习编程框架KaiSa。KaiSa通过分析深度学习模型输入参数,识别算子类型并划分至对应计算核;对于复杂算子,KaiSa基于性能模型自动完成最优分块大小的搜索,提升双核并行计算的性能。同时,为了实现程序的高效率开发,KaiSa屏蔽了所有的底层硬件细节,给用户提供了一个友好的编程环境。实验结果表明,KaiSa可以获得高达39.0%的性能提升。To meet the fast iteration speed and high computing power requirements of deep learning models,mainstream hardware vendors are increasingly inclined towards heterogeneous systems consisting of general-purpose processors and AI-specific accelerator cores.However,AI-specific accelerator cores only support certain core operators and do not have general programming capabilities.Therefore,how to efficiently deploy deep learning tasks on such heterogeneous architectures is worth further research.Based on the domestically developed FT-Matrix DSP+MatrixZone heterogeneous system platform,this paper designs and implements a deep learning programming framework,called KaiSa.KaiSa analyzes the input parameters of the deep learning model,identifies the operator type,and assigns it to the corresponding computing core.For complex operators,KaiSa automatically completes the optimal search for the block size based on a performance model,improving the performance of dual-core parallel computing.At the same time,KaiSa shields all low-level hardware details to provide users with a friendly programming environment for efficient program development.Experimental results show that KaiSa can achieve performance improvements of up to 39.0%.

关 键 词:深度学习 飞腾迈创 脉动加速器 异构系统 性能优化 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象