检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:魏铭康 李嘉楠 韩林 高伟[2] 赵荣彩 王洪生 WEI Mingkang;LI Jianan;HAN Lin;GAO Wei;ZHAO Rongcai;WANG Hongsheng(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,Henan,China;National Supercomputing Center in Zhengzhou(Zhengzhou University),Zhengzhou 450000,Henan,China)
机构地区:[1]郑州大学计算机与人工智能学院,河南郑州450000 [2]国家超级计算郑州中心(郑州大学),河南郑州450000
出 处:《计算机工程》2025年第5期62-72,共11页Computer Engineering
基 金:河南省重大科技专项(221100210600)。
摘 要:随着各大厂商对大模型应用部署需求的激增,深度学习编译器TVM(Tensor Virtual Machine)的单一量化方式精度下降,已无法满足部署需求。设计并构建一种可选粒度的模型量化框架,具体包括逐层与逐通道量化流程的支持,以及阈值搜索与自适应舍入优化算法的实现。首先,基于量化模块“relay.quantize”构建信息标注、阈值校准与量化图实现的框架流程,并添加粒度属性以显式识别量化方式。其次,针对预定义校准方法无法确定有效量化信息的问题,对量化中的阈值校准、权重舍入进行调优,提高量化后模型精度。实验采用ImageNet数据集对视觉网络进行测试,针对MobileNetV1新量化方案将8 bit量化后模型精度损失降低到2.3%,调优后该损失降低到0.7%,实验结果表明多粒度量化框架可有效降低量化误差。With the increasing demand for the deployment of large models by major manufacturers,the accuracy of the single quantization method of deep learning compiler Tensor Virtual Machine(TVM)has decreased,and this method is no longer sufficient to satisfy deployment requirements.Therefore,in this study,a flexible granularity model quantization framework is designed and constructed.This framework supports layer-wise and channel-wise quantization processes as well as the implementation of threshold search and adaptive rounding optimization algorithms.First,based on the quantization module″relay.quantize″,a framework flow for information annotation,threshold calibration,and quantization graph realization is constructed,which includes granularity attributes to explicitly identify the quantization method.Second,fine-tuning is applied to the threshold calibration and weight rounding in quantization to address the issue of ineffective quantization information determination using predefined calibration methods,thereby improving the accuracy of the quantized model.Experiments are conducted using the ImageNet dataset to test visual networks.The results reveal that the new quantization scheme for MobileNetV1 reduces the loss of model accuracy to 2.3%after 8 bit quantization,and this loss is reduced to 0.7%after tuning.Hence,the multi-granularity quantization framework can effectively reduce the quantization error.
关 键 词:模型量化 模型部署 模型压缩 推理加速 深度学习编译器
分 类 号:TP332[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7