检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨一晨 张国和[1] 梁峰[1] 何平 吴斌[1] 高震霆 YANG Yichen;ZHANG Guohe;LIANG Feng;HE Ping;WU Bin;GAO Zhenting(School of Electronics and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China)
机构地区:[1]西安交通大学电子与信息工程学院,西安710049
出 处:《西安交通大学学报》2018年第7期153-159,共7页Journal of Xi'an Jiaotong University
基 金:国家自然科学基金资助项目(61474093)
摘 要:针对大数据时代下深层次大规模深度学习网络模型在预测中对运算资源和访存带宽需求指数的增长,以及业界传统CPU+GPU解决方案难以应用于日益普遍的移动嵌入式应用场景等问题,提出了一个基于可编程逻辑器件(FPGA)的卷积神经网络协处理器异构加速设计方案。该方案采用通用模型设计思想,具有可编程性,并且能够兼容多种网路模型从而实现硬件加速;方案具有可扩展性,可在硬件资源允许的范围内进行多核扩展以获得性能翻倍提升。利用硬件的并行性,数据的复用性设计的卷积运算模块提高了硬件资源利用率及运算效率;合理配置的多级缓存结构降低了协处理器对外部存储器读写频率和带宽的占用率,提升了模块内部的通信效能。在XILINX VC707评估板的上板进行实验,结果表明,MNIST-LeNet测试集的准确率高达99%,CIFAR-10可实现80%,浮点运算速度为5.511×1010 s-1,综合性能约两倍于Intel Xeno E5-2640V4服务器通用处理器,达到同期FPGA解决方案的主流水平。In the era of big data,the demand for computing resources and memory bandwidth in deep-level and large-scale deep learning network models is increasing exponentially.Traditional industry solution CPU+ GPU is not suitable to the prevalent scenarios of mobile embedded applications.To deal with this problem,we proposed a design of convolutional neural network co-processor based on FPGA programmable logic device. This solution focuses on high compatibility.It has programmability and is compatible with a variety of network models to achieve hardware acceleration.It also has scalability to allow multi-core expansion within the range of hardware resources to achieve double performance. The design of convolutional operation module focuses on hardware parallelism and data reusability,which improves the utilization of hardware resources and computing efficiency.Rationally configured multi-level buffer structure reduces the co-processor's occupancy rate of external memory's read/write frequency and bandwidth,improves the internal communication efficiency of the module.The experimental results on the XILINX VC707 evaluation board show that the accuracy of the test set is 99%,the CIFAR-10 can achieve 80%,and the peak computing capability is 5.511×10^10 s^-1,the overall performance is approximately twice that of the general-purpose processor of Intel Xeno E5-2640 V4 server.Moreover,the processing performance of our design reaches the current mainstream level of FPGA solutions.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.120