Automatic parallelism strategy generation with minimalmemory redundancy  

在线阅读下载全文

作  者:Yanqi SHI Peng LIANG Hao ZHENG Linbo QIAO Dongsheng LI 

机构地区:[1]National Key Laboratory of Parallel and Distributed Computing,National University of Defense Technology,Changsha 410000,China

出  处:《Frontiers of Information Technology & Electronic Engineering》2025年第1期109-118,共10页信息与电子工程前沿(英文版)

基  金:supported by the National Natural Science Foundation of China(Nos.62025208 and 62421002)。

摘  要:Large-scale deep learning models are trained distributedly due to memory and computing resource limitations.Few existing strategy generation approaches take optimal memory minimization as the objective.To fill in this gap,we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory redundancy.We propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel strategy.To generate the optimal parallelism strategy,we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism strategies.Furthermore,the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory redundancy.Experimental results demonstrate that our approach achieves memory savings of up to 67%compared to the latest Megatron-LM strategies;in contrast,the gap between the throughput of our approach and its counterparts is not large.

关 键 词:Deep learning Automatic parallelism Minimal memory redundancy 

分 类 号:TN9[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象