检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高显扬 吴安[1] 慈潭龙 李金锋 赵伟康 GAO Xianyang;WUAn;CI Tanlong;LI Jinfeng;ZHAOWeikang(IEIT Systems Co.,Ltd.,Jinan 250101,China)
机构地区:[1]浪潮电子信息产业股份有限公司,济南250101
出 处:《计算机工程与应用》2025年第5期344-354,共11页Computer Engineering and Applications
摘 要:面向多样化应用场景需求和多元算力融合挑战,创新“一机多芯模块化服务器”软硬件体系结构。以服务器系统互连交换为中心,将多元计算单元和系统硬件资源进行解耦池化。通过标准化接口定义和统一控制与管理实现底层硬件差异化集成,实现多元算力协同、资源按需调配、系统统一调度与管理。关键技术包括高性能无阻塞总线互连交换、池化单元长距离低延时互连、内存和存储资源解耦池化、整机系统监控管理和系统资源拓扑管理等。一机多芯模块化服务器系统,可实现全部硬件解耦和弹性组合,实现在服务器系统内兼容多元算力模组以及多元算力和共享资源按需在线调配。实验结果显示一机多芯系统实现了均衡的16卡GPU低延时通信和系统性能线性提升,可面向AI场景实现异构算力按需分配;实现了亚微秒级远端内存访问,扩展了内存带宽和容量,有效提升系统性能;实现了细粒度存储池化资源共享,满足多主机高并发存储应用需求。To meet requirements of various applications and challenges of heterogeneous computing,this paper defines the“Multi-Core Composable Modular Server System”.The innovation is to highlight the system interconnection as the center instead of the traditional CPU as the center,thus achieving heterogeneous computing and system resource pooling.The heterogeneous computing modules and system resources are disaggregated with standard interface definition,and are composed dynamically by system management software which embraces the diversity of computing modules and system resources.The modular server describes the system hardware and software structures,and explains the key design technology including heterogeneous multicore high-bandwidth interconnection,low-latency and long-range interconnection,memory and storage pooling architecture,system integrated management and resource dynamic allocation,etc.The testing results show that the multicore composable modular server can achieve equalized sixteen GPU peer-to-peer low-latency communications and linear system performance improvement,and supports heterogeneous computing dynamic allocation for AI applications as well.The system also enables low-latency remote memory access,which extends the memory bandwidth and capacity,and improves the system performance.The system can share the pooling storage in fine-grained slices,which meets the requirement of multi-host high-concurrency storage access.
关 键 词:一机多芯 模块化服务器 融合架构 硬件解耦 资源池化 异构算力
分 类 号:TP302.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.198.25