FPGA中适用于低位宽乘累加的DSP块  被引量:1

Improved DSP Blocks for Low-bit Width Multiply Accumulates in FPGA

在线阅读下载全文

作  者:樊迪 王健[1] 来金梅[1] FAN Di;WANG Jian;LAI Jinmei(State Key Laboratory of ASIC&Systems,Fudan University,Shanghai 201203,China)

机构地区:[1]复旦大学专用集成电路与系统国家重点实验室,上海201203

出  处:《复旦学报(自然科学版)》2020年第5期575-584,共10页Journal of Fudan University:Natural Science

摘  要:Xilinx和Intel生产的许多先进现场可编程门阵列(Field Programmable Gate Array,FPGA)中,通常采用具有较高的固定位宽乘法器的数字信号处理(Digital Signal Processing,DSP)模块,它们往往不能高效支持低位宽乘累加(Multiply Accumulate,MAC)运算.为解决这一问题,本文提出一种支持低位宽乘累加的新DSP块,在实现Xilinx DSP48E1功能的基础上,通过数据移位、乘法器拆分与后置加法器单指令流多数据流(Single Instruction Multiple Data,SIMD)功能的配合,可以并行实现2个8-bit乘累加或2对共享乘数的4-bit乘累加,同时留出足够的保护位防止溢出.其中,乘法器拆分可减少部分积压缩时间,而新功能提高了DSP块利用率,从而使计算多个低位宽乘累加时所需DSP块数目变少,总使用面积减少.实验结果表明:与实现DSP48E1功能的基础DSP相比,新DSP计算速度提升了9%,当实现2倍数目的8-bit乘累加和实现4倍数目的共享乘数的4-bit乘累加时,DSP块使用总面积均减少40.8%,而单个DSP块面积增加18%.与其他文献中支持低位宽乘累加的DSP块相比,新DSP块对于4-bit乘累加的支持进一步增强,且改进方法更适应Xilinx DSP块的功能特点.Among many advanced FPGAs produced by Xilinx and Intel,DSP blocks with higher fixed bit width multipliers are usually used,and they often cannot efficiently support low-bit width MAC.In this paper,to solve this problem,a new DSP block supporting low-bit width MAC is proposed.On the basis of realizing the functions of Xilinx DSP48 E1,through the data shift,the division of multiplier and the functions of SIMD post-adder combined with,it can also implement two 8-bit MAC or 2 pairs of 4-bit MAC sharing common multipliers in parallel,and reserves enough guard bits to prevent overflow.The division of the multiplier can reduce the compression time of part-product,and the new function can,improve the utilization of the DSP block,thereby reducing the number of DSP blocks required when calculating multiple low-bit width MAC,thus reducing the total area.The experimental results show that compared with the baseline DSP which realizes the functions of DSP48 E1,the computing speed of the new DSP is 9%faster.When the 8-bit MAC of double number is realized and the 4-bit MAC of four times number with common multipliers is realized,the total area of DSP blocks are both reduced by 40.8%,and the area of single DSP block are increased by 18%.Compared with other DSP blocks which support low-bit width MAC proposed in other literature,the new DSP block further supports 4-bit MAC,and its improved methods are more adapted to the functional characteristics of Xilinx DSP blocks.

关 键 词:现场可编程门阵列 数字信号处理 乘累加 低位宽 

分 类 号:TN403[电子电信—微电子学与固体电子学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象