检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李俊杰[1] 刘宇旸 霍晓莉[2] Li Junjie;Liu Yuyang;Huo Xiaoli(China Telecom Research Institute,Beijing 102209,China)
机构地区:[1]中国电信股份有限公司研究院 [2]中国电信股份有限公司研究院传输网络研究中心
出 处:《信息通信技术》2024年第6期4-13,共10页Information and communications Technologies
摘 要:随着人工智能技术的发展,大型语言模型(LLM,例如Chat-GPT等)在多个应用场景中展现出卓越的应用前景,得到了迅猛发展。然而,随着模型规模的不断增大,训练这些大规模模型所需的计算资源和时间也呈现爆炸式增长,智算中心集群规模向着十万卡级甚至百万卡级加速演进。在此情况下,多数据中心的分布式训练成为重要的发展方向。光传输网络由于其超大带宽、超高可靠与超低时延等特性成为了分布式训练中海量数据的承载底座。文章从大模型训练的需求分析出发,介绍主流公司的训练能力与现状,并针对适用于多数据中心分布式训练的光传输网络技术进行分析与展望。With the development of artificial intelligence technology,large language models(LLMs,such as Chat GPT,etc.)have shown excellent application prospects in multiple scenarios and have experienced rapid growth.However,as the scale of models continues to increase,the computational resources and time required to train these large-scale models also have also shown explosive growth,and the scale of intelligent computing center clusters has accelerated towards the 100000 or even million card level.In this situation,distributed training with multiple data centers has become an important development direction.Optical transmission networks have become the carrier base for massive data in distributed training due to their characteristics of ultra-large bandwidth,ultra-high reliability,and ultra-low latency.Starting from the analysis of the training requirements for large models,this paper introduces the training capabilities and current situation of mainstream companies,and analyzes and prospects the optical transmission network technology suitable for distributed training in multiple data centers.
关 键 词:大模型 分布式训练 智算中心 光传输网络 800Gbit/s 多波段
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.124.95