分布式技术在大模型训练和推理中的应用  

Application of distributed techniques in large language model training and inference

在线阅读下载全文

作  者:郑纬民[1] ZHENG Weimin(Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)

机构地区:[1]清华大学计算机科学与技术系,北京100084

出  处:《大数据》2024年第5期1-10,共10页Big Data Research

基  金:国家自然科学基金项目(No.U23A6007)。

摘  要:近几年,人工智能被广泛应用于多个领域,大语言模型(以下简称大模型)的“预训练-微调”成为人工智能的最新范式。分布式技术存在于大模型生命周期的每一环,为大模型的发展助力。在数据获取环节,针对海量小文件的存储问题,研发了文件系统SuperFS,能够同时满足低延迟和可扩展的要求。在数据预处理环节,针对从分布式文件系统读取数据开销大的问题,研发了高效大数据处理引擎“诸葛弩”。在模型训练环节,针对检查点文件读写性能差的问题,提出了分布式检查点策略,加快了检查点文件的读写速度。在模型推理环节,针对KVCache对存储系统的挑战,研发了高吞吐推理方案FastDecode以及大模型推理架构Mooncake。分布式技术的应用,使大模型能够充分利用计算资源,加快训练速度,有利于人工智能领域的发展。In recent years,artificial intelligence has been widely applied in multiple fields,and the"pre-training and fine-tuning"of large models(LLMs)has become the latest paradigm of artificial intelligence.Distributed technology exists at every stage of the lifecycle of LLMs,providing support for them.In the data acquisition process,the file system called"SuperFS",was developed to address the storage issue of massive small files,which can meet the requirements of low latency and scalability.In the data preprocessing stage,an efficient big data processing engine called"Chukonu"was developed to address the issue of high overhead in reading data from distributed file systems.In the model training stage,a distributed checkpoint strategy was proposed to address the problem of poor read and write performance of checkpoint files,greatly improving the read and write speed of checkpoint files.In the model inference stage,a high-throughput inference scheme called"FastDecode"and a LLM inference architecture called"Mooncake"were developed to address the challenge posed by KVCache to storage system.The applications of distributed technology enable LLMs to fully utilize computing resources,accelerate training speed,and benefit the development of the field of artificial intelligence.

关 键 词:分布式技术 大模型 海量小文件 大数据处理引擎 检查点 KVCache 

分 类 号:TP319[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象