一种基于TVM的算子生成加速策略

One Acceleration Strategy for Operator Generation Based on TVM

作　　者：高伟[1] 李帅龙茆琳王磊[2] 李颖颖韩林 GAO Wei;LI Shuailong;MAO Lin;WANG Lei;LI Yingying;HAN Lin(National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450001,Henan,China;School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,Henan,China;19th Squadron,92196 Troop,Qingdao 266000,Shandong,China;School of Cyberspace Security,Information Engineering University,Zhengzhou 450001,Henan,China)

机构地区：[1]郑州大学国家超级计算郑州中心,河南郑州450001 [2]郑州大学计算机与人工智能学院,河南郑州450001 [3]92196部队19分队,山东青岛266000 [4]信息工程大学网络空间安全学院,河南郑州450001

出　　处：《计算机工程》2024年第8期353-362,共10页Computer Engineering

基　　金：河南省重大科技专项(221100210600)。

摘　　要：随着人工智能(AI)的飞速发展,新算子和底层硬件层出不穷,这给算子库的开发和维护带来了巨大的工作量。单纯基于手工优化来解决AI模型的性能和效率很容易出现瓶颈。TVM深度学习编译器通过代码的自动化生成减轻了手工优化的负担,但同时也存在搜索时间长的问题。为此,针对TVM的自动化代码生成框架Ansor,提出基于梯度提升算法的新代价模型和基于预定义规则的调度空间剪枝优化2种优化策略,旨在加速TVM的自动化代码生成过程,实现模型快速落地与部署,并进一步为人工智能技术的应用提供更高效的解决方案。实验结果表明,通过应用优化后代价模型可以在不损失推理时间的前提下,使得在x86 CPU平台上模型的调优时间减少30%~35%,同时优化后算子性能最高可提升22%,使得在深度计算单元(DCU)平台上模型的调优时间减少20%左右,同时优化后算子平均性能提升5.7%,此外,基于预定义规则的剪枝策略可以有效提升代价模型的收敛速度,并且在原有最佳迭代次数下,模型推理时间可提高7.4%。With the rapid development of Artificial Intelligence(AI),the continuous emergence of new operators and underlying hardware has increased the workload associated with the development and maintenance of operator libraries.Relying solely on manual optimization to improve the performance and efficiency of AI models can result in bottlenecks.The TVM deep learning compiler alleviates the burden of manual optimization through automated code generation.However,it also suffers from long search times.To address this issue,this study proposes two optimization strategies for Ansor,an automated code generation framework for TVM.The first strategy introduces a new cost model based on a gradient boosting algorithm,whereas the second strategy involves pruning the scheduling space based on predefined rules.The two optimization strategies aim to accelerate the automated code generation process of TVM,enabling quick deployment and implementation of models and providing more efficient solutions for the application of AI technology.The experimental results show that by applying the optimized cost model,the tuning time of the model on the x86 CPU platform can be reduced by 30%to 35%without losing inference time.Simultaneously,the performance of the optimized operator can be improved by up to 22%,thereby reducing the tuning time of the model on the Deep Computing Unit(DCU)platform by approximately 20%.Simultaneously,the average performance of the optimized operator can be improved by 5.7%.In addition,a pruning strategy based on predefined rules can effectively improve the convergence speed of the cost model,and the inference time of the model can be increased by 7.4%under the original optimal number of iterations.

关键词：深度学习编译器代价模型梯度提升算法剪枝策略自动调优

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于TVM的算子生成加速策略

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于TVM的算子生成加速策略

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索