检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yanqi SHI Peng LIANG Hao ZHENG Linbo QIAO Dongsheng LI
出 处:《Frontiers of Information Technology & Electronic Engineering》2025年第1期109-118,共10页信息与电子工程前沿(英文版)
基 金:supported by the National Natural Science Foundation of China(Nos.62025208 and 62421002)。
摘 要:Large-scale deep learning models are trained distributedly due to memory and computing resource limitations.Few existing strategy generation approaches take optimal memory minimization as the objective.To fill in this gap,we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory redundancy.We propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel strategy.To generate the optimal parallelism strategy,we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism strategies.Furthermore,the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory redundancy.Experimental results demonstrate that our approach achieves memory savings of up to 67%compared to the latest Megatron-LM strategies;in contrast,the gap between the throughput of our approach and its counterparts is not large.
关 键 词:Deep learning Automatic parallelism Minimal memory redundancy
分 类 号:TN9[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33