以子图融合为最小单位的混合精度推理

Mixed-Precision Inference with Subgraph Fusion as the Minimum Unit

作　　者：崔丽群[1] 胡磊 CUI Liqun;HU Lei(College of Software,Liaoning Technical University,Huludao 125105,China)

出　　处：《软件导刊》2024年第6期44-52,共9页Software Guide

基　　金：辽宁省高等学校基本科研项目(LJKMZ20220699)。

摘　　要：近几年卷积神经网络作为深度学习最重要的技术,在图像分类、物体检测、语音识别等领域均有所建树。在此期间,由多层卷积神经网络组成的深度神经网络横空出世,在各种任务准确性方面具有显著提升。然而,神经网络的权重往往被限定在单精度类型,使网络体积相较于特定硬件平台上的内存空间更大,且floating point 16、INT 8等单精度类型已无法满足现在一些模型推理的现实需求。为此,提出一种以子图为最小单位,通过判断相邻结点之间的融合关系,添加了丰富比特位的混合精度推理算法。首先,在原有单精度量化设计的搜索空间中增加floating point 16半精度的比特配置,使最终搜索空间变大,为寻找最优解提供更多机会。其次,使用子图融合的思想,通过整数线性规划将融合后的不同子图精度配置,根据模型大小、推理延迟和位宽操作数3个约束对计算图进行划分,使最后累积的扰动误差减少。最终,在ResNet系列网络上验证发现,所提模型精度相较于HAWQ V3的损失没超过1%的同时,相较于其他混合精度量化方法在推理速度方面得到了提升,在ResNet18网络中推理速度分别提升18.15%、19.21%,在ResNet50网络中推理速度分别提升13.15%、13.70%。In recent years,convolutional neural networks,as the most important technology in deep learning,have made achievements in fields such as image classification,object detection,and speech recognition.During this period,deep neural networks composed of multi-lay-er convolutional neural networks emerged,showing significant improvements in accuracy in various tasks.However,the weights of neural net-works are often limited to single precision types,resulting in a larger memory space compared to specific hardware platforms,and single preci-sion types such as floating point 16 and INT 8 can no longer meet the practical needs of some model inference today.To this end,a mixed pre-cision inference algorithm is proposed,which uses subgraphs as the minimum unit and adds rich bits by judging the fusion relationship be-tween adjacent nodes.Firstly,adding a floating point 16 semi precision bit configuration to the search space of the original single precision quantization design increases the final search space,providing more opportunities for finding the optimal solution.Secondly,using the idea of subgraph fusion,the accuracy of different fused subgraphs is configured through integer linear programming.The computational graph is divid-ed based on three constraints:model size,inference delay,and bitwidth operands,reducing the accumulated disturbance error in the end.In the end,it was verified on the ResNet series network that the proposed model had an accuracy loss of no more than 1%compared to HAWQ V3,while also improving inference speed compared to other mixed precision quantization methods.In the ResNet18 network,the inference speed was improved by 18.15%and 19.21%,respectively,and in the ResNet50 network,the inference speed was improved by 13.15%and 13.70%,respectively.

关键词：子图融合混合精度推理约束问题最优化求解 GPU加速

分类号：TN911.73[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

以子图融合为最小单位的混合精度推理

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

以子图融合为最小单位的混合精度推理

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索