面向多核向量加速器的卷积神经网络推理和训练向量化方法被引量：1

Convolutional neural network inference and training vectorization method for multicore vector accelerators

作　　者：陈杰[1] 李程刘仲[1] CHEN Jie;LI Cheng;LIU Zhong(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区：[1]国防科技大学计算机学院,湖南长沙410073

出　　处：《计算机工程与科学》2024年第4期580-589,共10页Computer Engineering & Science

基　　金：并行与分布处理国家重点实验室基金(2021-KJWPDL-11)。

摘　　要：随着以卷积神经网络为代表的深度学习得到广泛应用,神经网络模型中的计算量也急速增长,推动了深度学习加速器的发展。如何针对加速器硬件的体系结构特性进行加速和优化神经网络模型的性能成为研究热点。针对自主设计的多核向量加速器FT-M7004上的VGG网络模型推理和训练算法,分别提出了卷积、池化和全连接等核心算子的向量化映射方法,采用SIMD向量化、DMA双缓冲传输和权值共享等优化策略,充分发挥了向量加速器的体系结构优势,取得了较高的计算效率。实验结果表明,在FT-M7004平台上,卷积层推理和训练的平均计算效率分别达到了86.62%和69.63%;全连接层推理和训练的平均计算效率分别达到了93.17%和81.98%;VGG网络模型在FT-M7004上的推理计算效率超过GPU平台20%以上。With the widespread application of deep learning,represented by convolutional neural networks(CNNs),the computational requirements of neural network models have increased rapidly,driving the development of deep learning accelerators.The research focus has shifted to how to accelerate and optimize the performance of neural network models based on the architectural characteristics of accelerators.For the VGG network model inference and training algorithms on the independently designed multi core vector accelerator FT-M7004,vectorized mapping methods for core operators such as convolution,pooling,and fully connected layers are proposed.Optimization strategies,including SIMD vectorization,DMA double-buffered transfer,and weight sharing,are employed to fully exploit the architectural advantages of the vector accelerator,achieving high computational efficiency.Experimental results indicate that on the FT-M7004 platform,the average computational efficiency for convolution layer inference and training is 86.62%and 69.63%,respectively;for fully connected layer inference and training,the average computational efficiency reaches 93.17%and 81.98%,respectively.The inference computational efficiency of the VGG network model on FT-M7004 exceeds that on the GPU platform by over 20%.

关键词：多核向量加速器卷积神经网络推理算法训练算法

分类号：TP319[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向多核向量加速器的卷积神经网络推理和训练向量化方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向多核向量加速器的卷积神经网络推理和训练向量化方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向多核向量加速器的卷积神经网络推理和训练向量化方法被引量：1