一种超低损失的深度神经网络量化压缩方法  被引量:6

Ultra-low Loss Quantization Method for Deep Neural Network Compression

在线阅读下载全文

作  者:龚成 卢冶 代素蓉[1,2] 刘方鑫 陈新伟 李涛[1,2,4] GONG Cheng;LU Ye;DAI Su-Rong;LIU Fang-Xin;CHEN Xin-Wei;LI Tao(College of Computer Science,Nankai University,Tianjin 300350,China;Tianjin Key Laboratory of Network and Data Security Technology(Nankai University),Tianjin 300350,China;Industrial Robot Application of Fujian University Engineering Research Center(Minjiang University),Fujian 350121,China;State Key Laboratory of Computer Architecture(Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190,China)

机构地区:[1]南开大学计算机学院,天津300350 [2]天津市网络和数据安全技术重点实验室(南开大学),天津300350 [3]工业机器人应用福建省高校工程研究中心(闽江学院),福建福州350121 [4]计算机体系结构国家重点实验室(中国科学院计算技术研究所),北京100190

出  处:《软件学报》2021年第8期2391-2407,共17页Journal of Software

基  金:国家重点研发计划(2018YFB2100300);国家自然科学基金(62002175,61872200);天津自然科学基金(19JCZDJC31600,19JCQNJC00600);计算机体系结构国家重点实验室(中国科学院计算技术研究所)开放课题(CARCHB202016,CARCH201905);中国高校产学研创新基金(2020HYA01003);工业机器人应用福建省高校工程研究中心(闽江学院)开放基金(MJUKF-IRA1902)。

摘  要:深度神经网络(deep neural network,简称DNN)量化是一种高效的模型压缩方法,使用少量位宽表示模型计算过程中的参数和中间结果数据.数据位宽会直接影响内存占用、计算效率和能耗.以往的模型量化研究缺乏有效的定量分析,这导致量化损失难以预测.提出了一种超低损失的DNN量化方法(ultra-low loss quantization,简称μL2Q),以揭示量化位宽与量化损失之间的内在联系,指导量化位宽选择并降低量化损失.首先,将原始数据映射为标准正态分布的数据;然后,在等宽的量化区间中搜索最优量化参数;最后,将μL2Q方法融合进DNN的训练过程,并嵌入到主流的机器学习框架Caffe及Keras中,以支撑端到端模型压缩的设计和训练.实验结果表明,与最新的研究方法相比,在相同的位宽条件下,μL2Q方法能够保证更高的模型精度,在典型的神经网络模型上精度分别提高了1.94%,3.73%和8.24%.显著性物体检测实验结果表明,μL2Q方法能够胜任复杂的计算机视觉任务.Deep neural network(DNN)quantization is an efficient model compression method,in which parameters and intermediate results are expressed by low bit width.The bit width of data will directly affect the memory footprint,computing power and energy consumption.Previous researches on model quantization lack effective quantitative analysis,which leads to unpredictable quantization loss of these methods.This study proposes an ultra-low loss quantization(μL2Q)method for DNN compression,which reveals the internal relationship between quantization bit width and quantization loss,effectively guiding the selection of quantization bit width and reducing quantization loss.First,the original data is mapped to the data with standard normal distribution and then the optimal parameter configuration is sought to reduce the quantization loss under the target bit width.Finally,μL2Q has been encapsulated and integrated into two popular deep learning training frameworks,including Caffe and Keras,to support the design and training of end-to-end model compression.The experimental results show that compared with the state-of-the-art three clusters of quantization solutions,μL2Q can still guarantee the accuracy and deliver 1.94%,3.73%,and 8.24% of accuracy improvements under the typical neural networks with the same quantization bit width,respectively.In addition,it is also verified thatμL2Q can be competent for more complex computer vision tasks through salient object detection experiments.

关 键 词:神经网络压缩 神经网络量化 权值分布 均匀量化 量化损失最优解 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象