检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张立国[1] 黄文汉 金梅[1] ZHANG Liguo;HUANG Wenhan;JIN Mei(School of Electrical Engineering,Yanshan University,Qinhuangdao 066004)
出 处:《高技术通讯》2023年第10期1060-1067,共8页Chinese High Technology Letters
基 金:国家重点研发计划(2020YFB1711001)资助项目。
摘 要:卷积神经网络传统的应用平台是中央处理器(CPU)和图形处理器(GPU),其体积和功耗不能适应轻量化的行业,轻量化的专用集成电路(ASIC)平台专用加速器的开发成本又不能适应愈发复杂和深层次的网络结构。针对上述问题,设计一种基于现场可编程门阵列(FPGA)的卷积神经网络(CNN)加速器,既满足轻量化应用场景,又有低开发成本的特性。设计浮点加法器和浮点乘法器组合成卷积运算的基本运算单元,完成16 bits浮点数乘累加操作只需要消耗一个数字信号处理器(DSP)资源;针对FPGA运算特性设计了基于ReLU函数的激活层模块;设计可调节并行度的各层模块,可根据平台资源在性能、功耗和面积上取得平衡;设计用比较器简化的SoftMax模块。实验结果表明,在100 MHz工作频率下,峰值算力可达44.8 GFLOPS,功率仅为4.51 W。The traditional application platforms for convolutional neural networks are central processing unit(CPU)and graphics processing unit(GPU),whose size and power consumption cannot be adapted to lightweight industries,and the development cost of lightweight application specific integrated circuit(ASIC)cannot be adapted to increasingly complex and deep network structures.To address the above problems,an convolutional neural network(CNN)hardware accelerator based on field programmable gate array(FPGA)is designed to satisfy both lightweight application scenes and low development cost.Design the floating-point adder and floating-point multiplier to combine into the basic operation unit of convolutional operation,and complete the 16 bits floating-point multiply-accumulate operation only need to consume one digital signal processing(DSP)resource.An activation layer module based on ReLU function is designed for the computing characteristics of FPGA.Designing modules at each layer with adjustable parallelism allows for a balance between performance,power consumption,and area,depending on platform resources.Design of SoftMax modules simplified with comparators.Experimental results show that the peak arithmetic can reach 44.8 GFLOPS at 100 MHz operating frequency with only 4.51 W power.
关 键 词:现场可编程门阵列(FPGA) 卷积神经网络(CNN) 硬件加速器 并行度
分 类 号:TN791[电子电信—电路与系统] TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7