检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:樊迪 王健[1] 来金梅[1] FAN Di;WANG Jian;LAI Jinmei(State Key Laboratory of ASIC&Systems,Fudan University,Shanghai 201203,China)
机构地区:[1]复旦大学专用集成电路与系统国家重点实验室,上海201203
出 处:《复旦学报(自然科学版)》2020年第5期575-584,共10页Journal of Fudan University:Natural Science
摘 要:Xilinx和Intel生产的许多先进现场可编程门阵列(Field Programmable Gate Array,FPGA)中,通常采用具有较高的固定位宽乘法器的数字信号处理(Digital Signal Processing,DSP)模块,它们往往不能高效支持低位宽乘累加(Multiply Accumulate,MAC)运算.为解决这一问题,本文提出一种支持低位宽乘累加的新DSP块,在实现Xilinx DSP48E1功能的基础上,通过数据移位、乘法器拆分与后置加法器单指令流多数据流(Single Instruction Multiple Data,SIMD)功能的配合,可以并行实现2个8-bit乘累加或2对共享乘数的4-bit乘累加,同时留出足够的保护位防止溢出.其中,乘法器拆分可减少部分积压缩时间,而新功能提高了DSP块利用率,从而使计算多个低位宽乘累加时所需DSP块数目变少,总使用面积减少.实验结果表明:与实现DSP48E1功能的基础DSP相比,新DSP计算速度提升了9%,当实现2倍数目的8-bit乘累加和实现4倍数目的共享乘数的4-bit乘累加时,DSP块使用总面积均减少40.8%,而单个DSP块面积增加18%.与其他文献中支持低位宽乘累加的DSP块相比,新DSP块对于4-bit乘累加的支持进一步增强,且改进方法更适应Xilinx DSP块的功能特点.Among many advanced FPGAs produced by Xilinx and Intel,DSP blocks with higher fixed bit width multipliers are usually used,and they often cannot efficiently support low-bit width MAC.In this paper,to solve this problem,a new DSP block supporting low-bit width MAC is proposed.On the basis of realizing the functions of Xilinx DSP48 E1,through the data shift,the division of multiplier and the functions of SIMD post-adder combined with,it can also implement two 8-bit MAC or 2 pairs of 4-bit MAC sharing common multipliers in parallel,and reserves enough guard bits to prevent overflow.The division of the multiplier can reduce the compression time of part-product,and the new function can,improve the utilization of the DSP block,thereby reducing the number of DSP blocks required when calculating multiple low-bit width MAC,thus reducing the total area.The experimental results show that compared with the baseline DSP which realizes the functions of DSP48 E1,the computing speed of the new DSP is 9%faster.When the 8-bit MAC of double number is realized and the 4-bit MAC of four times number with common multipliers is realized,the total area of DSP blocks are both reduced by 40.8%,and the area of single DSP block are increased by 18%.Compared with other DSP blocks which support low-bit width MAC proposed in other literature,the new DSP block further supports 4-bit MAC,and its improved methods are more adapted to the functional characteristics of Xilinx DSP blocks.
分 类 号:TN403[电子电信—微电子学与固体电子学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229