SCC上FFT的高效并行实现及其扩展性研究  

Efficient Parallel Implementation of FFT on SCC and SCC's Expansibility Research

在线阅读下载全文

作  者:汪清[1,2] 顾乃杰[1,2] 何颂颂[1,2] 杨阳朝[1,2] 

机构地区:[1]中国科学技术大学计算机学院,合肥230027 [2]安徽省计算与通信软件重点实验室,合肥230027

出  处:《小型微型计算机系统》2014年第6期1207-1211,共5页Journal of Chinese Computer Systems

基  金:国家"核高基"重大专项(2009ZX01028-002-003-005)资助;国家自然科学基金项目(60833004)资助

摘  要:针对SCC(Single-Chip Cloud Computer,单芯片云计算机)体系结构,通过通信路由的改进、消息传递的预处理以及数据处理的再划分这三种手段来提升FFT并行实现效率并以此来研究SCC的扩展性.实验结果表明,SCC上改进后的FFT在一定规模内,双核下的平均加速比为4.10倍,最高可达4.78倍;四核下平均加速比为6.01倍,最高可达6.77倍;八核下平均加速比为10.46倍,最高可达11.53倍;十六核下平均加速比为16.20倍,最高可达18.51倍;三十二核下平均加速比为21.17倍,最高可达到24.20倍.并且随着规模的增加,核间通信带宽趋于稳定,三十二核的加速比也逐渐增大,结果显示SCC具有良好的可扩展性.According to the characteristics of SCC architecture, this paper shows three ways to improve the Parallel efficiency of FFT and study the expansibility of SCC. improvement of the communication routing, message pretreatment and the division of data pro- cessing. The experimental results show that the improved FFT on SCC chip in the certain scale, 2cores can get 4.10x speedup in aver- age and even can achieve 4.78x speedup at highest;4cores can get 6.01x speedup in average and even can achieve 6.77x speedup at highest. ;8cores can get 10.46x speedup in average and even can achieve 11.53x speedup at highest. ;16-core can get 16.20x speedup in average and even can achieve 18.51 x speedup at highest;32-core can get 21.17x speedup in average and even can achieve 24.20x speedup at highest. And with the increase of the scale, nuclear communication bandwidth tends to be stable, the speedup of 32-Cores also gradually increasing, So the results showed that the SCC has good expansibility.

关 键 词:FFT SCC RCCE 并行化 加速比 扩展性 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象