面向多功能张量加速器的细粒度结构化稀疏设计

Fine-Grained Structured Sparse Design for Versatile Tensor Accelerator

作　　者：赵桦筝庞善民[1] 赵英海[2] 华高晖李晨阳段战胜[3] 梅魁志[4] ZHAO Huazheng;PANG Shanmin;ZHAO Yinghai;HUA Gaohui;LI Chenyang;DUAN Zhansheng;MEI Kuizhi(School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;Beijing Huahang Institute of Radio Measurement,Beijing 100013,China;School of Automation Science and Engineering,Xi’an Jiaotong University,Xi’an 710049,China;College of Artificial Intelligence,Xi’an Jiaotong University,Xi’an 710049,China)

机构地区：[1]西安交通大学软件学院,西安710049 [2]北京华航无线电测量研究所,北京100013 [3]西安交通大学自动化科学与工程学院,西安710049 [4]西安交通大学人工智能学院,西安710049

出　　处：《西安交通大学学报》2024年第11期176-184,共9页Journal of Xi'an Jiaotong University

基　　金：新疆维吾尔自治区重点研发计划资助项目(2022B01008-1);国家自然科学基金资助项目(62076193)。

摘　　要：为解决模型压缩算法与多功能张量加速器(VTA)的适配性问题,通过改进经典的YOLObile分块剪枝方法,完成面向该加速器的自适应细粒度结构化稀疏设计及性能评估。针对VTA的多重循环维度展开特性,对模型的权重张量进行32×32大小的分块;结合时间维度的自蒸馏与空间维度的教师蒸馏,进行多维度特征对齐;通过一阶段式迭代训练方式,改进原有的ADMM算法计算流程,在提升模型部署精度的同时减少训练成本;提出自适应层剪枝率模块,进行总剪枝率的自适应分配,实现端到端的自动化剪枝。实验结果表明:改进方法有效减少了约2.4%的浮点计算量,并在图像分类、目标检测等多项任务中提升了压缩模型的精度,最大增长百分比为2.6%。该方法为深度学习模型在VTA上的稀疏化部署提供了一种高效、轻量级的软件解决方案。In order to address the compatibility issue between model compression algorithms and the versatile tensor accelerator(VTA),an adaptive fine-grained structured sparse design tailored for this accelerator is proposed by enhancing the classical YOLObile block-wise pruning method and evaluates its performance.In light of the multi-dimensional loop unfolding characteristics of VTA,the model’s weight tensors are divided into 32×32 blocks.This approach integrates temporal distillation and spatial distillation to align multidimensional features.Through a single-stage iterative training method,the calculation process of the original ADMM algorithm is refined to improve model deployment accuracy while reducing training costs.An adaptive layer pruning rate module is introduced to dynamically allocate the total pruning rate,facilitating end-to-end automated pruning.The experimental results demonstrate that this improved method effectively reduces floating-point computations by approximately 2.4%and enhances the accuracy of compressed models across various tasks such as image classification and object detection,with a maximum growth percentage of 2.6%.This method offers an efficient and lightweight software solution for the sparse deployment of deep learning models on VTAs.

关键词：神经网络轻量化模型稀疏化深度学习多功能张量加速器模型部署

分类号：TP31[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向多功能张量加速器的细粒度结构化稀疏设计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向多功能张量加速器的细粒度结构化稀疏设计

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索