检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵永浩 贾海鹏[2] 张云泉[2] 张思佳 ZHAO Yong-hao;JIA Hai-peng;ZHANG Yun-quan;ZHANG Si-jia(College of Information Engineering,Dalian Ocean University,Dalian 116023;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]大连海洋大学信息工程学院,辽宁大连116023 [2]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190
出 处:《计算机工程与科学》2021年第4期662-669,共8页Computer Engineering & Science
基 金:国家重点研发计划(2017YFB0202105,2018YFC0809306,2016YFB0200803,2017YFB0202302);国家自然科学基金(61972376);北京市自然科学基金(L182053)。
摘 要:在计算机图形学、积分计算和神经网络等应用场景中,平方根函数的高性能实现在构建处理器的基础软件生态中起到了十分重要的作用。随着ARM架构处理器得到广泛的使用,研究ARM架构下的函数快速算法实现变得更加关键。当前大量处理器都采用了SIMD架构,所以,研究基于SIMD实现高性能函数计算方法具有重要的研究意义和发展前景。因此,对平方根函数进行了高性能的实现与优化。通过分析IEEE 754标准的浮点数在内存中的存储格式,设计了高效的平方根函数算法;然后通过结合平方根倒数和泰勒公式算法,进一步提高了算法精度;最后通过SIMD优化进一步提升了算法性能。实验结果表明,在满足精度的前提下,相比于libm算法库,实现的平方根函数的,性能提高了约7倍,相比于ARM V8提供的计算平方根的指令在性能上提高了约3倍。In computer graphics, integral calculation, neural network and other application scenarios, the high-performance implementation of Square Root function plays a very important role in the construction of the basic software ecology of processors. With the widespread use of ARM architecture processors, it becomes more critical to study the fast algorithm implementation of functions under ARM architecture. At present, SIMD architecture is adopted by a large number of processors. Therefore, it is of great significance and development prospect to study the high performance function calculation method based on SIMD. To this end, this paper implements and optimizes the Square Root function with high performance. By analyzing the storage format of IEEE 754 standard float point number in memory, an efficient algorithm of Square Root function is designed, and then the algorithm precision is further improved by combining Square Root inverse and Taylor formula algorithm. Finally, the algorithm performance is further improved by SIMD optimization. According to the experimental results, on the premise of satisfying the accuracy, the performance of the implemented Square Root function is more than 7 times higher than the libm algorithm library, and more than 3 times higher than the instruction of calculating Square Root provided by ARM V8.
关 键 词:平方根函数 SIMD 高性能 数值分析 ARM V8架构
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.100.57