探索非零位置约束:算法-硬件协同设计的DNN稀疏训练方法  

Exploring non-zero position constraints:algorithm-hardware co-designed DNN sparse training method

在线阅读下载全文

作  者:王淼[1] 张盛兵[1] 张萌[1] WANG Miao;ZHANG Shengbing;ZHANG Meng(School of Computer Science,Northwestern Polytechnical University,Xi'an 710072,China)

机构地区:[1]西北工业大学计算机学院,陕西西安710072

出  处:《西北工业大学学报》2025年第1期119-127,共9页Journal of Northwestern Polytechnical University

摘  要:设备上的学习使得边缘设备能连续适应人工智能应用的新数据。利用稀疏性消除训练过程中的冗余计算和存储占用是提高边缘深度神经网络(deep neural network,DNN)学习效率的关键途径。然而由于缺乏对非零位置的假设,往往需要昂贵的代价用于实时地识别和分配零的位置以及对不规则计算的负载均衡,这使得现有稀疏训练工作难以接近理想加速比。如果能提前预知训练过程中操作数的非零位置约束规则,就可以跳过这些处理开销,从而提升稀疏训练性能和能效比。针对稀疏训练过程,面向边缘场景中典型的3类激活函数探索操作数之间的位置约束规则,提出:①一个硬件友好的稀疏训练算法以减少3个阶段的计算量和存储压力;②一个高能效的稀疏训练加速器,能预估非零位置使得实时处理代价被并行执行掩盖。实验表明所提出的方法比密集加速器和2个其他稀疏训练工作的能效比分别提升了2.2倍,1.38倍和1.46倍。On-device learning enables edge devices to continuously adapt to new data for AI applications.Leveraging sparsity to eliminate redundant computation and storage usage during training is a key approach to improving the learning efficiency of edge deep neural network(DNN).However,due to the lack of assumptions about non-zero positions,expensive runtime identification and allocation of zero positions and load balancing of irregular computations are often required,making it difficult for existing sparse training works to approach the ideal speedup.This paper points out that if the non-zero position constraints of operands during training can be predicted in advance,these processing overheads can be skipped to improve sparse training energy efficiency.Therefore,this paper explores the position constraint rules between operands for three typical activation functions in edge scenarios during sparse training.And based on these rules,this paper proposed a parev hardware-friendly sparse training algorithm to reduce the computation and storage pressure of the three phases,and an energy-efficient sparse training accelerator that can be executed in parallel with the forward propagation computation to estimate the non-zero positions so that the runtime processing cost is masked.Experiments show that the proposed method is 2.2 times,1.38 times and 1.46 times more energy efficient than dense accelerator and two other sparse training tasks respectively.

关 键 词:稀疏训练 非零位置约束 DNN 稀疏加速器 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象