基于SIMD的Square Root函数高性能实现与优化  被引量:2

High-performance implementation and optimization of Square Root function based on SIMD

在线阅读下载全文

作  者:赵永浩 贾海鹏[2] 张云泉[2] 张思佳 ZHAO Yong-hao;JIA Hai-peng;ZHANG Yun-quan;ZHANG Si-jia(College of Information Engineering,Dalian Ocean University,Dalian 116023;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]大连海洋大学信息工程学院,辽宁大连116023 [2]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190

出  处:《计算机工程与科学》2021年第4期662-669,共8页Computer Engineering & Science

基  金:国家重点研发计划(2017YFB0202105,2018YFC0809306,2016YFB0200803,2017YFB0202302);国家自然科学基金(61972376);北京市自然科学基金(L182053)。

摘  要:在计算机图形学、积分计算和神经网络等应用场景中,平方根函数的高性能实现在构建处理器的基础软件生态中起到了十分重要的作用。随着ARM架构处理器得到广泛的使用,研究ARM架构下的函数快速算法实现变得更加关键。当前大量处理器都采用了SIMD架构,所以,研究基于SIMD实现高性能函数计算方法具有重要的研究意义和发展前景。因此,对平方根函数进行了高性能的实现与优化。通过分析IEEE 754标准的浮点数在内存中的存储格式,设计了高效的平方根函数算法;然后通过结合平方根倒数和泰勒公式算法,进一步提高了算法精度;最后通过SIMD优化进一步提升了算法性能。实验结果表明,在满足精度的前提下,相比于libm算法库,实现的平方根函数的,性能提高了约7倍,相比于ARM V8提供的计算平方根的指令在性能上提高了约3倍。In computer graphics, integral calculation, neural network and other application scenarios, the high-performance implementation of Square Root function plays a very important role in the construction of the basic software ecology of processors. With the widespread use of ARM architecture processors, it becomes more critical to study the fast algorithm implementation of functions under ARM architecture. At present, SIMD architecture is adopted by a large number of processors. Therefore, it is of great significance and development prospect to study the high performance function calculation method based on SIMD. To this end, this paper implements and optimizes the Square Root function with high performance. By analyzing the storage format of IEEE 754 standard float point number in memory, an efficient algorithm of Square Root function is designed, and then the algorithm precision is further improved by combining Square Root inverse and Taylor formula algorithm. Finally, the algorithm performance is further improved by SIMD optimization. According to the experimental results, on the premise of satisfying the accuracy, the performance of the implemented Square Root function is more than 7 times higher than the libm algorithm library, and more than 3 times higher than the instruction of calculating Square Root provided by ARM V8.

关 键 词:平方根函数 SIMD 高性能 数值分析 ARM V8架构 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象