面向灵活并行度的稀疏卷积神经网络加速器  被引量:3

A Sparsity-Aware Convolutional Neural Network Accelerator with Flexible Parallelism

在线阅读下载全文

作  者:袁海英 曾智勇 成君鹏 YUAN Hai-ying;ZENG Zhi-yong;CHENG Jun-peng(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China)

机构地区:[1]北京工业大学信息学部,北京100124

出  处:《电子学报》2022年第8期1811-1818,共8页Acta Electronica Sinica

摘  要:大规模卷积神经网络计算复杂度高且资源开销大,这极大提高了深度学习算法的硬件部署成本.在模型推理过程中充分利用层间稀疏激活的信息冗余,以较低资源开销和几乎无损的网络精度降低推理时延和功耗提供高效的加速器解决方案.针对稀疏卷积神经加速器中控制粒度过大导致运算模块利用率过低问题,本文提出基于FPGA具有灵活并行度的稀疏卷积神经网络加速器架构.基于运算簇思想对卷积运算模块实现灵活调度,根据卷积层结构在线调整输入通道和输出激活的并行度;根据输出激活并行运算的数据一致性设计了一种输入数据的并行传播方式.本文在Xilinx VC709目标设备上实现了提出的加速器硬件架构,它包含1024个乘累加单元,提供409.6 GOP/s理论峰值算力;实际运算速度在VGG-16模型中达到325.8 GOP/s,等效于稀疏激活优化前加速器的794.63 GOP/s,运算性能达到baseline模型4.6倍以上.Convolutional neural network involves in high computational complexity and excessive hardware resources,which greatly increases hardware deployment cost of deep learning algorithm.It is a promising scheme to make full use of the information redundancy of sparsity activation between layers can reduce the inference delay and power consumption with low resource overhead and almost lossless network accuracy.To solve low utilization problem of operation module caused by coarse-grained control in sparse convolution neural network accelerator,a sparsity-aware accelerator with flexible parallelism based on FPGA is designed.Convolution operation module is flexibly scheduled based on operation clustering idea,and the parallelism of input channel and output activation is adjusted online.In addition,a parallel propagation mode of input data is designed according to the data consistency during output activated parallel operation.The proposed hardware architecture is implemented on Xilinx VC709.It contains up to 1024 multiplication and accumulation units and provides 409.6 GOP/s peak computing power,and the operation speed is up to 325.8 GOP/s in VGG-16 model,which is equivalent to 794.63 GOP/s of accelerator without sparse activation optimization.Its performance is 4.6 times more than that of baseline model.

关 键 词:FPGA 卷积神经网络 硬件加速 稀疏感知 并行计算 

分 类 号:TN47[电子电信—微电子学与固体电子学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象