面向深度学习编译器TVM的算子融合优化

Operator Fusion Optimization for Deep Learning Compiler TVM

作　　者：高伟[1] 王磊[1,2] 李嘉楠李帅龙韩林 GAO Wei;WANG Lei;LI Jianan;LI Shuailong;HAN Lin(National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450001,China;School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China)

机构地区：[1]国家超级计算郑州中心(郑州大学),郑州450001 [2]郑州大学计算机与人工智能学院,郑州450001

出　　处：《计算机科学》2025年第5期58-66,共9页Computer Science

基　　金：河南省重大科技专项“国产先进计算平台创新生态及应用研究”(221100210600)。

摘　　要：算子融合是深度学习编译器中的一种编译优化技术,能够将多个算子合并为一个大的算子,有效降低计算和访存的成本。深度学习编译器TVM的算子融合方案中将算子按照功能特性进行分类,并设计融合规则,最后采用贪心算法进行融合。这种融合方案存在以下问题:首先,功能特性的算子分类方式下的融合规则不够通用,会错失算子融合机会,无法实现更大粒度的融合;其次,贪心的融合算法也无法实现算子融合的最优解。针对上述问题,对TVM进行改进,提出按照算子输入输出映射类型的算子分类方式,并设计通用的算子融合规则以扩大算子融合的粒度;提出基于动态规划的融合方案搜索算法和算子融合代价评估模型,并对搜索空间进行剪枝,使得算法能够在合理时间内搜索得到优化的融合方案。为评测融合方案的有效性,在CPU以及DCU等平台上对VGG-16,Efficient-B0,MobileNet-V1,YOLO-V4等深度学习模型的融合比和推理时延进行测试,实验结果表明,相较于TVM原有融合方案,所提方案融合比平均提升了27%,推理时延平均获得了1.75的加速比。Operator fusion technique is an optimization method employed by deep learning compilers to combine multiple operators into a single,larger operator.This approach effectively reduces computation costs and memory access requirements.In the operator fusion scheme of deep learning compiler TVM,operators are categorized based on their functional characteristics,fusion rules are devised,and a greedy algorithm is utilized for fusion.However,this fusion scheme has the following problems.Firstly,the fusion rules derived from functional feature classification may not be sufficiently generalizable,leading to missed opportunities for operator fusion and limited granularity.Secondly,the greedy algorithm fails to achieve optimal solutions for operator fusion.To address these issues,improvements are made in TVM by introducing an operator classification method based on input/output mapping types and designing a more comprehensive set of fusion rules that expand the granularity of operator fusion.Additionally,a search algorithm for finding suitable fusion schemes and a cost evaluation model based on dynamic programming are proposed to prune the search space and enable efficient identification of optimal solutions within reasonable time frames.To evaluate the effectiveness of this enhanced fusion scheme,experiments are conducted using popular deep learning models such as VGG-16,Efficient-B0,MobileNet-V1 and YOLO-V4 on both CPU and DCU platforms.The experimental results show that compared with the original fusion scheme of TVM,the fusion ratio of deep learning models can be improved.The average fusion ratio is increased by 27%,and the average inference delay rate is 1.75.

关键词：深度学习编译器 TVM 算子融合融合规则动态规划

分类号：TP314[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向深度学习编译器TVM的算子融合优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向深度学习编译器TVM的算子融合优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索