基于粗粒度可重构架构的并行FFT算法实现  被引量:3

Parallel FFT algorithm implementation based on coarse-grained reconfigurable architecture

在线阅读下载全文

作  者:曹鹏[1] 杨锦江[1] 梅晨[1] 

机构地区:[1]东南大学国家专用集成电路系统工程技术研究中心,南京210096

出  处:《东南大学学报(自然科学版)》2013年第6期1174-1179,共6页Journal of Southeast University:Natural Science Edition

基  金:国家自然科学基金资助项目(61204023;61203251;61272183);国家高技术研究发展计划(863计划)资助项目(2012AA012703)

摘  要:为了提升并行 FFT 算法的计算性能,基于粗粒度可重构架构 REMUS_LPP(reconfigurable embedded multimedia system,low performance processor)提出了一种新的复数 FFT 实现方法.在实现 FFT 算法过程中,首先通过局部串行方式完成低阶部分,然后交换低阶部分结果后并行执行高阶部分.针对 RCA 内和 RCA 间的数据流优化,提出了流水气泡消除技术和数据块重排技术,从而提升了算法实现性能并降低了片上存储需求.芯片实测结果表明,提出的 FFT 算法实现方法的执行速度是其他同类并行计算架构的2.15~13.60倍,片上存储减少为其他方法的7.0%~28.1%.In order to enhance the performance of the fast Fourier transform (FFT)algorithm,an implementation of complex FFT based on REMUS_LPP(reconfigurable embedded multimedia sys-tem,low performance processor),which is a coarse-grained reconfigurable architecture (CGRA)-based architecture,is proposed.The lower stages of the FFT algorithm are performed in local serial mode,and then the higher stages are carried out in parallel mode with the exchanged intermediate re-sult of lower stages.Aiming at the optimization of data transfer in and between reconfigurable com-puting arrays (RCAs),the technique of pipeline bubble elimination and data block location rear-rangement are presented to enhance the performance and reduce the on-chip memory cost.The pro-posed FFT algorithm was realized with real chip.The processing speed of the proposed FFT algo-rithm implementation is 2.15 to 13.60 times higher than that of other parallel FFT implementations with only a 7.0% to 28.1% local memory cost.

关 键 词:粗粒度可重构架构 并行FFT算法 REMUS_LPP 

分 类 号:TN302[电子电信—物理电子学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象