检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈斌 裴培[2] 许鹏 Chen Bin;Pei Pei;Xu Peng(Intelligent Network&Innovation Center of China Unicom,Beijing 100048,China;China Information Technology Designing&Consulting Institute Co.,Ltd.,Beijing 100048,China)
机构地区:[1]中国联通智网创新中心,北京100048 [2]中讯邮电咨询设计院有限公司,北京100048
出 处:《邮电设计技术》2024年第9期1-6,共6页Designing Techniques of Posts and Telecommunications
摘 要:随着大模型的高速发展,智算需求的增长速度远超芯片性能提升速度,计算集群方案和“DC as a Computer”概念应运而生,数据中心网络变得尤为重要。在大模型训练和推理时,集群对网络系统的稳定性要求极高。针对大模型业务特点,结合主流集群网络技术,研究了训练场景下的超大规模组网、超高吞吐和超稳定的新一代智算中心网络技术,以及推理场景下通过SDN+SRv6可编程算网一体智能调度和切片技术构建高品质的入算网络,并研究了DC间协同训练的技术难点和应对方案。With the rapid development of large models,the growth rate of intelligent computing demand far exceeds the speed of chip performance improvement.The computing cluster scheme and the concept of“DC as a Computer”emerges as a result,which makes the data center network become particularly important.During the training and inference of large models,clusters require extremely high stability of the network system.Based on the characteristics of large model services,and combined with the mainstream cluster network technology,it studies the new generation of intelligent computing center network technology in the training scenario of ultra-large scale networking,ultra-high throughput and ultra-stable,as well as the construction of high-quality computing networks through SDN+SRv6 programmable network integrated intelligent scheduling and slicing technology in inference scenarios,and the technical difficulties and countermeasures of DC collaborative training is also studied.
关 键 词:广域网络 智算中心网络 带宽池化 跨集群模型训练
分 类 号:TN915[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.138.112.77