Design and implementation of dual-mode configurable memory architecture for CNN accelerator  

在线阅读下载全文

作  者:山蕊 LI Xiaoshuo GAO Xu HUO Ziqing SHAN Rui;LI Xiaoshuo;GAO Xu;HUO Ziqing(School of Electronic Engineering,Xi'an University of Posts and Telecommunications,Xi’an 710121,P.R.China)

机构地区:[1]School of Electronic Engineering,Xi'an University of Posts and Telecommunications,Xi’an 710121,P.R.China

出  处:《High Technology Letters》2024年第2期211-220,共10页高技术通讯(英文版)

基  金:Supported by the National Key R&D Program of China(No.2022ZD0119001);the National Natural Science Foundation of China(No.61834005,61802304);the Education Department of Shaanxi Province(No.22JY060);the Shaanxi Provincial Key Research and Devel-opment Plan(No.2024GX-YBXM-100)。

摘  要:With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work.

关 键 词:distributed memory structure neural network accelerator reconfigurable arrayprocessor configurable memory structure 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程] TP333[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象