检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高亦沁 罗智宇 王一超 林新华 GAO Yiqin;LUO Zhiyu;WANG Yichao;LIN Xinhua(Network&Information Center,Shanghai Jiao Tong University,Shanghai 200240,China)
出 处:《计算机科学》2025年第5期11-24,共14页Computer Science
基 金:国家重点研发计划(2023YFB3002001)。
摘 要:超级计算机是“国之重器”,我国在“十四五”期间建设后E级国产超算,支撑关系国计民生的重大计算应用。操作系统作为超算核心系统软件之一,其开销将影响超算整机的运行性能,因此操作系统测评成为新一代国产超算技术路线的重要研究课题之一。openEuler在搭载了鲲鹏处理器的系统上有良好的性能与兼容性,但尚未在超算领域有过大规模应用,因此需要对其性能进行全面评测,并对存在的性能瓶颈进行优化。文中的工作分为两个部分。1)对openEuler在超算系统上的性能开展了评测,并以CentOS为参考对象进行了对比。结果表明,在运行非集合通信密集型应用时,openEuler的性能与CentOS相当。然而,在使用OpenMPI进行Allreduce等集合通信操作时,openEuler的性能会降低最多76.83%,并导致千核规模下通信密集型应用的性能降低最多23.01%。2)基于在评测过程中发现的MPI集合通信性能问题,提出了一种性能建模与优化方法。该方法基于点对点通信的霍克尼模型,为集合通信各实现算法进行建模,以预测不同进程数量和消息大小下的通信时间,从而选择合适的集合通信实现算法。所提方法可通过OpenMPI的MCA接口在运行时动态调整实现算法的选择。优化后,openEuler上的科学计算应用性能提升显著,运行时间最多缩短了26%。Supercomputers play a crucial role in supporting scientific computing applications.During these five years,our country is developing post-exascale domestic supercomputers.As one of the core components of supercomputers,the operating system’s overhead will impact the performance of the supercomputer system.Therefore,the evaluation of the OS is one of the important subjects in supercomputer research.Among existing domestic OSs,openEuler offers high performance and compatibility on systems equipped with Kunpeng processors.However,openEuler has not been extensively applied to supercomputers.Therefore,it is necessary to evaluate its performance on supercomputers,and optimize the existing performance bottlenecks.Our work can be divided into two parts.1)We evaluate the compatibility of openEuler and its performance when running HPC applications.CentOS is used as a reference for comparison.The evaluation results show that when running non-communication-intensive applications,the performance of openEuler is comparable to CentOS.However,when using OpenMPI for collective communication operations such as Allreduce,the performance on openEuler decreases by up to 76.83%.Additionally,under thousand-core scale,the parallel efficiency of communication-intensive applications on openEuler decreases by up to 23.01%.2)Based on the performance issues with MPI collective communication identified during the evaluation process,we propose a performance modeling and optimization method.This method relies on the Hockney model of point-to-point communication to model the performance of various collective communication algorithm implementations.It predicts communication time under different numbers of processes and message sizes,enabling the selection of suitable collective communication algorithm implementations.Utilizing the MCA interface of OpenMPI,this method allows for dynamic adjustment of algorithm implementations at runtime.After optimization,the perfor-mance of HPC applications on openEuler has been significantly improved,with a ma
关 键 词:高性能计算 国产超级计算机 操作系统 性能评测 集合通信性能
分 类 号:TP316[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49