检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于立[1] 韩林 罗有才 商建东 YU Li;HAN Lin;LUO Youcai;SHANG Jiandong(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,Henan,China;National Supercomputing Center in Zhengzhou,Zhengzhou 450001,Henan,China)
机构地区:[1]郑州大学计算机与人工智能学院,河南郑州450001 [2]国家超级计算郑州中心,河南郑州450001
出 处:《计算机工程》2024年第2期51-58,共8页Computer Engineering
基 金:河南省重大科技专项(221100210600)。
摘 要:目前国产自主可控FT-M6678平台上没有对称矩阵特征值求解相关的实现,且平台上现有数学计算库不能很好地满足类似问题求解的需求。面向国产FT-M6678处理器,对对称矩阵特征值求解(SYEV)算法进行实现与优化,完善FT-M6678平台的线性代数计算库。通过对SYEV算法的实现过程以及运行热点的分析,基于FT-M6678平台进行编译优化、访存优化以及向量并行化优化,其中:编译优化是根据不同的编译选项指导编译器对程序优化以达到加速效果;访存优化包括缓存优化以及数据段与程序段的分配优化,用于提高矩阵数据的访存效率;向量并行化优化包括循环展开以及适配FT-M6678平台的单指令多数据流(SIMD)指令并行优化,用于提升程序的计算效率。在FT-M6678平台上对所实现并优化的算法进行正确性验证与优化性能分析,结果表明,算法能够正确通过LAPACK官方测试集测试,并且在FT-M6678平台上的加速效果可达到58.346倍,对比TMS320C6678平台速度可提升2.053倍。Currently,there is no implementation related to the symmetric matrix eigenvalue solution on China's autonomous and controllable FT-M6678 platform,and the existing mathematical calculation library on this platform cannot satisfy the requirements for solving similar problems.This study focuses on the domestic FT-M6678 processor,implements and optimizes the algorithm of the symmetric matrix eigenvalue solution,SYEV,and improves the linear algebra calculation library of the FT-M6678 platform.First,by analyzing the implementation process and running hotspots of the SYEV algorithm,compile,memory access,and vector parallel optimizations are performed based on the FT-M6678 platform.Compilation optimization refers to guiding the compiler to optimize programs based on different compilation options to achieve acceleration effects;memory access optimization includes cache optimization and allocation optimization of data and program segments,accelerating the efficiency of matrix data access;and vector parallelization optimization includes loop unrolling and Single Instruction Multiple Data(SIMD)instruction parallel optimization adapted to the FT-M6678 platform,which improves the computational efficiency of programs.Verification and performance tests of the implemented and optimized algorithms are performed using the FT-M6678 platform.The accuracy of the algorithms passes the test of official Linear Algebra PACKage(LAPACK)test set,and the optimization acceleration effect of the algorithm on the FT-M6678 platform can reach 58.346 times,which can improve the speed by 2.053 times compared with the TMS320C6678 platform.
关 键 词:对称矩阵特征值 FT-M6678平台 热点分析 缓存优化 向量并行
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222