一种基于可编程逻辑器件的卷积神经网络协处理器设计被引量：7

Design of FPGA Based Convolutional Neural Network Co-Processor

作　　者：杨一晨张国和[1] 梁峰[1] 何平吴斌[1] 高震霆 YANG Yichen;ZHANG Guohe;LIANG Feng;HE Ping;WU Bin;GAO Zhenting(School of Electronics and Information Engineering,Xi＇an Jiaotong University,Xi＇an 710049,China)

机构地区：[1]西安交通大学电子与信息工程学院,西安710049

出　　处：《西安交通大学学报》2018年第7期153-159,共7页Journal of Xi'an Jiaotong University

基　　金：国家自然科学基金资助项目(61474093)

摘　　要：针对大数据时代下深层次大规模深度学习网络模型在预测中对运算资源和访存带宽需求指数的增长,以及业界传统CPU+GPU解决方案难以应用于日益普遍的移动嵌入式应用场景等问题,提出了一个基于可编程逻辑器件(FPGA)的卷积神经网络协处理器异构加速设计方案。该方案采用通用模型设计思想,具有可编程性,并且能够兼容多种网路模型从而实现硬件加速;方案具有可扩展性,可在硬件资源允许的范围内进行多核扩展以获得性能翻倍提升。利用硬件的并行性,数据的复用性设计的卷积运算模块提高了硬件资源利用率及运算效率;合理配置的多级缓存结构降低了协处理器对外部存储器读写频率和带宽的占用率,提升了模块内部的通信效能。在XILINX VC707评估板的上板进行实验,结果表明,MNIST-LeNet测试集的准确率高达99%,CIFAR-10可实现80%,浮点运算速度为5.511×1010 s-1,综合性能约两倍于Intel Xeno E5-2640V4服务器通用处理器,达到同期FPGA解决方案的主流水平。In the era of big data,the demand for computing resources and memory bandwidth in deep-level and large-scale deep learning network models is increasing exponentially.Traditional industry solution CPU＋ GPU is not suitable to the prevalent scenarios of mobile embedded applications.To deal with this problem,we proposed a design of convolutional neural network co-processor based on FPGA programmable logic device. This solution focuses on high compatibility.It has programmability and is compatible with a variety of network models to achieve hardware acceleration.It also has scalability to allow multi-core expansion within the range of hardware resources to achieve double performance. The design of convolutional operation module focuses on hardware parallelism and data reusability,which improves the utilization of hardware resources and computing efficiency.Rationally configured multi-level buffer structure reduces the co-processor＇s occupancy rate of external memory＇s read/write frequency and bandwidth,improves the internal communication efficiency of the module.The experimental results on the XILINX VC707 evaluation board show that the accuracy of the test set is 99%,the CIFAR-10 can achieve 80%,and the peak computing capability is 5.511×10^10 s^-1,the overall performance is approximately twice that of the general-purpose processor of Intel Xeno E5-2640 V4 server.Moreover,the processing performance of our design reaches the current mainstream level of FPGA solutions.

关键词：深度学习卷积神经网络可编程逻辑器件

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于可编程逻辑器件的卷积神经网络协处理器设计被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于可编程逻辑器件的卷积神经网络协处理器设计 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于可编程逻辑器件的卷积神经网络协处理器设计被引量：7