检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈暾 李志豪[1,2] 贾海鹏 张云泉[1] CHEN Tun;LI Zhi-Hao;JIA Hai-Peng;ZHANG Yun-Quan(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100190)
机构地区:[1]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190 [2]中国科学院大学,北京100190
出 处:《计算机学报》2019年第11期2384-2402,共19页Chinese Journal of Computers
基 金:国家重点研发计划(2017YFB0202105,2016YFB0200803,2017YFB0202302);国家自然科学基金青年基金(61602443);国家自然科学基金重点基金(61272136);国家自然科学基金创新群体(61521092);广东省重大科技专项项目(2015B010108006)资助~~
摘 要:FFT(快速傅里叶变换)是用于计算离散傅里叶变换(DFT)或其逆运算的快速算法,它广泛应用于工程、科学和数学计算.到目前为止,鲜有基于ARM平台的高性能FFT算法的实现和优化,然而,随着ARMv8处理器应用的日益广泛,研究FFT算法在ARM平台上高性能实现日益重要.该文在ARMv8平台上实现和优化了一个高性能的多维FFT算法库:PerfFFT,通过FFT蝶形网络优化、蝶形计算优化、蝶形自动生成、SIMD优化、内存对齐、cache-aware的分块算法和高效转置等优化方法的应用,显著提升了FFT算法的性能.实验结果表明,PerfFFT相比目前应用最为广泛的开源FFT库FFTW实现了10%~591%的性能提升,而相比ARM高性能商业库ARM Performance Library实现了13%~44%的性能提升.With the development of ARM architecture,especially the introduction of ARMv8 architecture,ARM’s application fields are more and more extensive.Research on ARM architecture has become a hotspot.Therefore,it is important to build a complete ARM software ecosystem.It is of great research significance and practical value to study the implementation and optimization of Fast Fourier Transform(FFT)algorithm in ARMv8 platform.Its computing ability has been greatly improved and application area has become more extensive.FFT is a fast algorithm for calculating Discrete Fourier Transform(DFT)or its inverse operation.It is widely used in engineering,science and mathematics.So far,there is a little implementation and optimization of high-performance FFT algorithm based on ARM platform.We implement and optimize a highperformance multi-dimensional FFT library on the ARMv8 platform which is PerfFFT.It is optimized by FFT butterfly network optimization,butterfly optimization,butterfly auto-generation,SIMD optimization,assembly optimization,memory alignment,cache-aware blocking algorithm,efficient matrix transposition and other optimization methods.These approaches greatly enhance the FFT algorithm performance.The results of experiments show that PerfFFT achieves a 10%to 591%,and 13% to 44% performance improvement compared to ARM high-performance commercial library(ARM Performance Library).Our main contributions are as follows:First,we propose a set of FFT algorithm implementation and optimization on ARMv8 platform,which not only improves the performance of FFT algorithm on ARMv8 platform,but also has practical Guiding significance for implementation of other algorithms on ARM platform.Second,we propose a set of FFT butterfly calculation code automatic generation scheme.A computational template is formed by abstracting and extracting typical computational patterns of butterfly calculations for different radix of the FFT.And on this basis,it can automatically generate high performance code of different radix FFT butterfly calcul
关 键 词:ARMv8 FFT算法 FFTW ARMPL SIMD优化 CACHE优化 矩阵分块
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49