基于ARMv8平台的多维FFT实现与优化研究  被引量:10

Multi-Dimensional FFT Implementation and Optimization on ARMv8 Platform

在线阅读下载全文

作  者:陈暾 李志豪[1,2] 贾海鹏 张云泉[1] CHEN Tun;LI Zhi-Hao;JIA Hai-Peng;ZHANG Yun-Quan(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100190)

机构地区:[1]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190 [2]中国科学院大学,北京100190

出  处:《计算机学报》2019年第11期2384-2402,共19页Chinese Journal of Computers

基  金:国家重点研发计划(2017YFB0202105,2016YFB0200803,2017YFB0202302);国家自然科学基金青年基金(61602443);国家自然科学基金重点基金(61272136);国家自然科学基金创新群体(61521092);广东省重大科技专项项目(2015B010108006)资助~~

摘  要:FFT(快速傅里叶变换)是用于计算离散傅里叶变换(DFT)或其逆运算的快速算法,它广泛应用于工程、科学和数学计算.到目前为止,鲜有基于ARM平台的高性能FFT算法的实现和优化,然而,随着ARMv8处理器应用的日益广泛,研究FFT算法在ARM平台上高性能实现日益重要.该文在ARMv8平台上实现和优化了一个高性能的多维FFT算法库:PerfFFT,通过FFT蝶形网络优化、蝶形计算优化、蝶形自动生成、SIMD优化、内存对齐、cache-aware的分块算法和高效转置等优化方法的应用,显著提升了FFT算法的性能.实验结果表明,PerfFFT相比目前应用最为广泛的开源FFT库FFTW实现了10%~591%的性能提升,而相比ARM高性能商业库ARM Performance Library实现了13%~44%的性能提升.With the development of ARM architecture,especially the introduction of ARMv8 architecture,ARM’s application fields are more and more extensive.Research on ARM architecture has become a hotspot.Therefore,it is important to build a complete ARM software ecosystem.It is of great research significance and practical value to study the implementation and optimization of Fast Fourier Transform(FFT)algorithm in ARMv8 platform.Its computing ability has been greatly improved and application area has become more extensive.FFT is a fast algorithm for calculating Discrete Fourier Transform(DFT)or its inverse operation.It is widely used in engineering,science and mathematics.So far,there is a little implementation and optimization of high-performance FFT algorithm based on ARM platform.We implement and optimize a highperformance multi-dimensional FFT library on the ARMv8 platform which is PerfFFT.It is optimized by FFT butterfly network optimization,butterfly optimization,butterfly auto-generation,SIMD optimization,assembly optimization,memory alignment,cache-aware blocking algorithm,efficient matrix transposition and other optimization methods.These approaches greatly enhance the FFT algorithm performance.The results of experiments show that PerfFFT achieves a 10%to 591%,and 13% to 44% performance improvement compared to ARM high-performance commercial library(ARM Performance Library).Our main contributions are as follows:First,we propose a set of FFT algorithm implementation and optimization on ARMv8 platform,which not only improves the performance of FFT algorithm on ARMv8 platform,but also has practical Guiding significance for implementation of other algorithms on ARM platform.Second,we propose a set of FFT butterfly calculation code automatic generation scheme.A computational template is formed by abstracting and extracting typical computational patterns of butterfly calculations for different radix of the FFT.And on this basis,it can automatically generate high performance code of different radix FFT butterfly calcul

关 键 词:ARMv8 FFT算法 FFTW ARMPL SIMD优化 CACHE优化 矩阵分块 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象