面向深度学习编译器的多粒度量化框架支持与优化  

Support and Optimization of Multi-Granularity Quantization Framework for Deep Learning Compiler

在线阅读下载全文

作  者:魏铭康 李嘉楠 韩林 高伟[2] 赵荣彩 王洪生 WEI Mingkang;LI Jianan;HAN Lin;GAO Wei;ZHAO Rongcai;WANG Hongsheng(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,Henan,China;National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450000,Henan,China)

机构地区:[1]郑州大学计算机与人工智能学院,河南郑州450000 [2]国家超级计算郑州中心(郑州大学),河南郑州450000

出  处:《计算机工程》2025年第5期62-72,共11页Computer Engineering

基  金:河南省重大科技专项(221100210600)。

摘  要:随着各大厂商对大模型应用部署需求的激增,深度学习编译器TVM(Tensor Virtual Machine)的单一量化方式精度下降,已无法满足部署需求。设计并构建一种可选粒度的模型量化框架,具体包括逐层与逐通道量化流程的支持,以及阈值搜索与自适应舍入优化算法的实现。首先,基于量化模块“relay.quantize”构建信息标注、阈值校准与量化图实现的框架流程,并添加粒度属性以显式识别量化方式。其次,针对预定义校准方法无法确定有效量化信息的问题,对量化中的阈值校准、权重舍入进行调优,提高量化后模型精度。实验采用ImageNet数据集对视觉网络进行测试,针对MobileNetV1新量化方案将8 bit量化后模型精度损失降低到2.3%,调优后该损失降低到0.7%,实验结果表明多粒度量化框架可有效降低量化误差。With the increasing demand for the deployment of large models by major manufacturers,the accuracy of the single quantization method of deep learning compiler Tensor Virtual Machine(TVM)has decreased,and this method is no longer sufficient to satisfy deployment requirements.Therefore,in this study,a flexible granularity model quantization framework is designed and constructed.This framework supports layer-wise and channel-wise quantization processes as well as the implementation of threshold search and adaptive rounding optimization algorithms.First,based on the quantization module″relay.quantize″,a framework flow for information annotation,threshold calibration,and quantization graph realization is constructed,which includes granularity attributes to explicitly identify the quantization method.Second,fine-tuning is applied to the threshold calibration and weight rounding in quantization to address the issue of ineffective quantization information determination using predefined calibration methods,thereby improving the accuracy of the quantized model.Experiments are conducted using the ImageNet dataset to test visual networks.The results reveal that the new quantization scheme for MobileNetV1 reduces the loss of model accuracy to 2.3%after 8 bit quantization,and this loss is reduced to 0.7%after tuning.Hence,the multi-granularity quantization framework can effectively reduce the quantization error.

关 键 词:模型量化 模型部署 模型压缩 推理加速 深度学习编译器 

分 类 号:TP332[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象