小规模非规则TRSM实现与优化  

Small-Scale Irregular TRSM Implementation and Optimization

在线阅读下载全文

作  者:郭容园 贾海鹏[2] 张云泉[2] 韦存阳 邓明森[1] 陈婧蕊 周振亚 Guo Rongyuan;Jia Haipeng;Zhang Yunquan;Wei Cunyang;Deng Mingsen;Chen Jingrui;Zhou Zhenya(School of Information,Guizhou University of Finance and Economics,Guiyang 550025;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Empyrean Technology Co.,Ltd.,Beijing 100102)

机构地区:[1]贵州财经大学信息学院,贵阳550025 [2]中国科学院计算技术研究所,北京100190 [3]北京华大九天科技股份有限公司,北京100102

出  处:《计算机研究与发展》2025年第2期517-531,共15页Journal of Computer Research and Development

基  金:国家重点研发计划项目(2023YFB3001701);山西省科技重大专项(202201010101004);国家自然科学基金项目(61972376,62372432,62072431)。

摘  要:TRSM(triangular matrix equation solver)是线性方程组求解的常用算法,是各种科学计算库和数学软件的核心算法,广泛应用于科学计算、工程计算、机器学习等领域.小规模非规则TRSM算法限定解决问题范围,是高效处理较小规模、非规则数据输入的算法.随着高性能计算领域个性化、精细化发展,科学界、工业界对小规模非规则TRSM计算的需求愈加明显.传统算法更偏重于大规模、规则TRSM的计算,在小规模非规则TRSM计算上效率不佳.结合硬件体系结构、应用场景特征提出小规模非规则TRSM优化方案,从寄存器分块、边界处理、向量化计算角度设计高性能内核,在此基础上构建覆盖双精度实数、双精度复数的小规模非规则算法库SI_TRSM(small-scale irregular TRSM),大幅度提升该算法性能.实验结果表明,构建的双精度小规模非规则TRSM算法库,较MKL(Intel math kernel library)同类算法,在双精度小规模非规则实数上平均性能提高29.4倍,在双精度小规模非规则复数上平均性能提高24.6倍.TRSM(triangular matrix equation solver)is a commonly used algorithm for solving systems of linear equations,and is the core algorithm of various scientific computing libraries and mathematical software,which is widely used in the fields of scientific computing,engineering computing and machine learning.The small-scale irregular TRSM algorithm limits the scope of problem-solving and is an algorithm for efficiently handling smallerscale,irregular data inputs.With the development of personalization and refinement in the field of high-performance computing,the demand for small-scale irregular TRSM computation in the scientific and industrial communities is becoming more and more obvious.While traditional algorithms are better suited for large-scale and regular TRSM computation,there is still room for improvement in the computational efficiency of small-scale and irregular TRSM.In this paper,we propose a small-scale irregular TRSM optimization scheme by combining hardware architecture and application scenario characteristics,designing a high-performance kernel from the perspectives of register chunking,boundary processing,and vectorization computation,and constructing an algorithmic library of small-scale irregular SI_TRSM(small-scale irregular TRSM)covering double-precision real numbers and double-precision complex numbers based on which the performance of this algorithm is greatly improved.Based on experimental results,the double-precision small-scale irregular TRSM algorithm library developed in this paper has shown to enhance the average performance of double-precision small-scale irregular real numbers by 29.4 times,and double-precision smallscale irregular complex numbers by 24.6 times in comparison with similar algorithms available in the MKL(Intel math kernel library).

关 键 词:TRSM算法 BLAS 小规模非规则 SIMD 汇编优化 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象