深度学习FPGA加速器的进展与趋势  被引量:63

The Progress and Trends of FPGA-Based Accelerators in Deep Learning

在线阅读下载全文

作  者:吴艳霞[1] 梁楷 刘颖[2] 崔慧敏[2] WU Yan-Xia;LIANG Kai;LIU Ying;CUI Hui-Min(College of Computer Science and Technology,Harbin Engineering University,Harbin 150001;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)

机构地区:[1]哈尔滨工程大学计算机科学与技术学院,哈尔滨150001 [2]中国科学院计算技术研究所计算机体系结构国家重点实验,北京100190

出  处:《计算机学报》2019年第11期2461-2480,共20页Chinese Journal of Computers

基  金:国家“八六三”高技术研究发展计划项目(2016YFB1000402);黑龙江省自然科学基金(F2018008);哈尔滨市杰出青年人才基金(2017RAYXJ016)资助~~

摘  要:随着大数据时代的来临,深度学习技术在从海量数据中提取有价值信息方面发挥着重要作用,已被广泛应用于计算机视觉、语音识别及自然语言处理等领域.本文从深度学习算法的特点和发展趋势出发,分析FPGA加速深度学习的优势以及技术挑战;其次,本文从SoC FPGA和标准FPGA两个方面介绍了CPU-FPGA平台,主要对比分析了两种模型在CPU和FPGA之间数据交互上的区别;接下来,在介绍FPGA加速深度学习算法开发环境的基础上,重点从硬件结构、设计思路和优化策略这三个方面详细介绍了采用FPGA加速卷积神经网络的设计方案;最后展望了FPGA加速深度学习算法相关研究工作的发展.With the coming of big data era,the technology of deep learning plays a critical role in extracting the meaningful information from the massive data.Also,it has been widely applied in some domains,such as computer vision,speech recognition and natural language processing.As far as we all know,deep learning algorithms have a large number of parameters and relevant matrix multiplication or multiply-and-add operations with the simple computing model.At the same time,researches and industries need higher and higher accuracy,the complex models which have more and more weights won the image classification and object detection contest.To speed up the inference and training of deep learning becomes much more important.This paper mainly reviews one of the approaches,accelerating deep learning on FPGAs.Firstly,this paper introduces the deep learning algorithms and relative characteristics especially the convolutional neural network and recurrent neural network and why the FPGA approach can fit this problem,what people can be benefit compared with CPU only or GPU and what people should do to enable it.After that,this paper analyzes the challenges of accelerating deep learning on FPGAs.Additionally,the CPU-FPGA is the most welcome architecture at present,but there’re different methods to set up a usable platform.And for CPU-FPGA platforms,data communication is one of the important factors for affecting the performance of acceleration.Based on these,this paper introduces different methods from the aspects of SoC FPGA and standard FPGA,and comparatively analyzes the differences of the data communication between CPU and FPGA of the two methods.For different engineers and developers,the development environments and tools of accelerating deep learning on FPGAs are presented in this paper from the aspects of high-level language,such as C and OpenCL,and hardware description language,such as Verilog.It is not hard to use techniques which require many complex low-level hardware control operations for FPGA implementations to improve

关 键 词:深度学习 神经网络 CPU-FPGA 硬件加速 FPGA 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象