基于SIMD扩展部件的长向量超越函数实现方法  被引量:2

Implementation of Transcendental Functions on Vectors Based on SIMD Extensions

在线阅读下载全文

作  者:刘聃 郭绍忠[1] 郝江伟 许瑾晨[1] LIU Dan;GUO Shao-zhong;HAO Jiang-wei;XU Jin-chen(State Key Laboratory of Mathematical Engineering and Advanced Computing,PLA Information Engineering University,Zhengzhou 450002,China)

机构地区:[1]信息工程大学数学工程与先进计算国家重点实验室,郑州450002

出  处:《计算机科学》2021年第6期26-33,共8页Computer Science

基  金:国家自然科学基金项目(61802434)。

摘  要:基础数学函数库是计算机系统非常关键的软件模块,然而国产申威平台上的长向量超越函数只能依靠循环调用系统标量函数来间接实现,该方法无法充分发挥申威平台SIMD扩展部件的计算性能。为了有效解决此问题,实现了申威平台基于SIMD扩展部件底层优化的长向量超越函数,提出了浮点计算融合算法,解决了两分支结构算法难以向量化的问题;提出了基于Estrin算法动态分组的大阶数多项式实现方法,提高了多项式汇编计算的流水性能。这是在国产申威平台上首次实现长向量超越函数库,提供的函数接口包含三角函数、反三角函数、对数函数、指数函数等。实验结果表明,双精度版本最大误差控制在3.5ULP(unit in the last place)以下,单精度版本最大误差控制在0.5ULP以下,该性能与申威平台直接循环调用系统标量函数相比有显著提高,平均加速比为3.71。The basic mathematical function library is a critical soft module in the computer system.However,the long vector transcendental function on the domestic Shenwei platform can only be implemented indirectly by cyclic utilizing the system scalar function currently,thus limiting the computing capability of the SIMD extensions of Shenwei platform.In order to solve this problem effectively,this paper implements the long vector transcendental function based on lower-level optimization of SIMD extensions of Shenwei platform and proposes the floating-point computing fusion algorithm for solving the problem that the two-branch structure algorithm is difficult to vectorize.It also proposes the implementation method of higher degree polynomials based on the dynamic grouping of Estrin algorithm,which improves the pipelining performance of polynomial assembly evaluation.This is the first time to implement the long vector transcendental function library on the Shenwei platform.The provided function interfaces include trigonometric functions,inverse trigonometric functions,logarithmic functions,exponential functions,etc.The experimental result shows that the maximum error of double precision version is controlled below 3.5ULP(unit in the last place),and the maximum error of single precision version is controlled below 0.5ULP.Compared with the scalar function of Shenwei platform,the performance is significantly improved,and the average speedup ratio is 3.71.

关 键 词:基础数学库 向量超越函数 国产平台 流水优化 浮点计算 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象