检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵桦筝 庞善民[1] 赵英海[2] 华高晖 李晨阳 段战胜[3] 梅魁志[4] ZHAO Huazheng;PANG Shanmin;ZHAO Yinghai;HUA Gaohui;LI Chenyang;DUAN Zhansheng;MEI Kuizhi(School of Software Engineering,Xi’an Jiaotong University,Xi’an 710049,China;Beijing Huahang Institute of Radio Measurement,Beijing 100013,China;School of Automation Science and Engineering,Xi’an Jiaotong University,Xi’an 710049,China;College of Artificial Intelligence,Xi’an Jiaotong University,Xi’an 710049,China)
机构地区:[1]西安交通大学软件学院,西安710049 [2]北京华航无线电测量研究所,北京100013 [3]西安交通大学自动化科学与工程学院,西安710049 [4]西安交通大学人工智能学院,西安710049
出 处:《西安交通大学学报》2024年第11期176-184,共9页Journal of Xi'an Jiaotong University
基 金:新疆维吾尔自治区重点研发计划资助项目(2022B01008-1);国家自然科学基金资助项目(62076193)。
摘 要:为解决模型压缩算法与多功能张量加速器(VTA)的适配性问题,通过改进经典的YOLObile分块剪枝方法,完成面向该加速器的自适应细粒度结构化稀疏设计及性能评估。针对VTA的多重循环维度展开特性,对模型的权重张量进行32×32大小的分块;结合时间维度的自蒸馏与空间维度的教师蒸馏,进行多维度特征对齐;通过一阶段式迭代训练方式,改进原有的ADMM算法计算流程,在提升模型部署精度的同时减少训练成本;提出自适应层剪枝率模块,进行总剪枝率的自适应分配,实现端到端的自动化剪枝。实验结果表明:改进方法有效减少了约2.4%的浮点计算量,并在图像分类、目标检测等多项任务中提升了压缩模型的精度,最大增长百分比为2.6%。该方法为深度学习模型在VTA上的稀疏化部署提供了一种高效、轻量级的软件解决方案。In order to address the compatibility issue between model compression algorithms and the versatile tensor accelerator(VTA),an adaptive fine-grained structured sparse design tailored for this accelerator is proposed by enhancing the classical YOLObile block-wise pruning method and evaluates its performance.In light of the multi-dimensional loop unfolding characteristics of VTA,the model’s weight tensors are divided into 32×32 blocks.This approach integrates temporal distillation and spatial distillation to align multidimensional features.Through a single-stage iterative training method,the calculation process of the original ADMM algorithm is refined to improve model deployment accuracy while reducing training costs.An adaptive layer pruning rate module is introduced to dynamically allocate the total pruning rate,facilitating end-to-end automated pruning.The experimental results demonstrate that this improved method effectively reduces floating-point computations by approximately 2.4%and enhances the accuracy of compressed models across various tasks such as image classification and object detection,with a maximum growth percentage of 2.6%.This method offers an efficient and lightweight software solution for the sparse deployment of deep learning models on VTAs.
关 键 词:神经网络轻量化 模型稀疏化 深度学习 多功能张量加速器 模型部署
分 类 号:TP31[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3