深度神经网络模型量化方法综述  被引量:6

A survey of quantization methods for deep neural networks

在线阅读下载全文

作  者:杨春 张睿尧 黄泷 遆书童 林金辉 董志伟 陈松路 刘艳[2] 殷绪成[1,3] YANG Chun;ZHANG Ruiyao;HUANG Long;TI Shutong;LIN Jinhui;DONG Zhiwei;CHEN Songlu;LIU Yan;YIN Xucheng(School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China;School of Automation and Electrical Engineering,University of Science and Technology Beijing,Beijing 100083,China;USTB−EEasyTech Joint Lab of Artificial Intelligence,Beijing 100083,China)

机构地区:[1]北京科技大学计算机与通信工程学院,北京100083 [2]北京科技大学自动化学院,北京100083 [3]北京科技大学−亿智电子人工智能联合实验室,北京100083

出  处:《工程科学学报》2023年第10期1613-1629,共17页Chinese Journal of Engineering

基  金:国家新一代人工智能(2030)重大项目(2020AAA0109701);国家自然科学基金资助项目(62076024,62006018);中央高校基本科研业务费资助项目(FRF-IDRY-21-018)。

摘  要:近年来,利用大型预训练模型来提高深度神经网络在计算机视觉以及自然语言处理等具体任务下的泛化能力和性能,逐渐成为基于深度学习的人工智能技术与应用的发展趋势.虽然这些深度神经网络模型表现优异,但是由于模型的结构复杂、参数量庞大与计算成本极高,使得它们仍然难以被部署在如家电或智能手机等资源受限的边缘及端侧硬件平台上,这很大程度上阻碍了人工智能技术的应用.因此,模型压缩与加速技术一直都是深度神经网络模型大规模商业化应用推广的关键问题之一.当前在多种模型压缩与加速方案中,模型量化是其中主要的有效方法之一.模型量化技术可以通过减少深度神经网络模型参数的位宽和中间过程特征图的位宽,从而达到压缩加速深度神经网络的目的,使量化后的网络能够部署在资源有限的边缘设备上,然而,由于量化会导致信息的大量丢失,如何在保证模型任务精度条件下实现模型量化已经成为热点问题.另外,因硬件设备以及应用场景的不同,模型量化技术已经发展成为一个多分支的研究问题.通过全面地调研不同角度下模型量化相关技术现状,并且深入地总结归纳不同方法的优缺点,可以发现量化技术目前仍然存在的问题,并为未来可能的发展指明方向.The study of deep neural networks has recently gained widespread attention in recent years,with many researchers proposing network structures that exhibit exceptional performance.A current trend in artificial intelligence(AI)technology involves using deep learning and its applications via large-scale pretrained deep neural network models.This approach aims to improve the generalization capability and task-specific performance of the model,particularly in areas such as computer vision and natural language processing.Despite their success,the deployment of high-performance deep neural network models on edge hardware platforms,such as household appliances and smartphones,remains challenging owing to the high complexity of the neural network architecture,substantial storage overhead,and computational costs.These factors hinder the availability of AI technologies to the public.Therefore,compressing and accelerating deep neural network models have become a critical issue in the promotion of their large-scale commercial applications.Owing to the growing support for low-precision computation technology provided by AI hardware manufacturers,model quantization has emerged as a promising approach for the compression and acceleration of machine learning models.By reducing the bit width of deep neural network model parameters and intermediate feature maps during the forward propagation of the model,memory usage,computation efficiency,and energy consumption can be substantially reduced,enabling the utilization of quantized deep neural network models in resource-limited edge devices.However,this approach involves a critical tradeoff between task performance and hardware deployment,which directly impacts its potential for practical application.Quantizing the model to a low-bit precision can lead to considerable information loss,often resulting in a catastrophic degradation of the task performance of the model.Thus,alleviating the challenges of model quantization while maintaining task performance has become a critical research t

关 键 词:深度神经网络 模型压缩与加速 模型量化 量化感知训练 后训练量化 混合精度量化 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象