基于虚拟化的GPU异构资源池平台架构设计、关键技术及应用研究  被引量:1

Architecture design,key technologies,and application research of GPU heterogeneous resource pool platform based on virtualization

在线阅读下载全文

作  者:张万才 张楠 杨文清 王涛 张文强 ZHANG Wancai;ZHANG Nan;YANG Wenqing;WANG Tao;ZHANG Wenqiang(Nari Technology Development Co.,Ltd.,Nanjing 211100,China)

机构地区:[1]国电南瑞科技股份有限公司,江苏南京211100

出  处:《电信科学》2024年第9期162-175,共14页Telecommunications Science

基  金:国家电网公司科技项目(No.524608210272)。

摘  要:人工智能算力资源面临价格高昂、市场断供等现状问题,传统的单卡单用模式导致资源利用率和使用效率低下,现有的技术研究手段难以支撑多元异构图形处理单元(graphics processing unit,GPU)资源的高效管理和调度。基于此,提出一种基于虚拟化的GPU异构资源池平台,首先对平台总体架构、逻辑架构和功能架构进行了规划设计;其次,对关键技术进行研究,提出了虚拟化异构GPU资源池框架和基于时间切片+负载均衡的调度模型;最后,基于所提方法,提出了多业务单卡叠加、交叉拉远、跨机整合、混合部署和时分复用等多种创新应用模式。所提方法为企业级AI应用提供了可兼容多个GPU不同厂商、支持远程访问、可灵活切分和聚合、可弹性调度的GPU算力资源。经测算分析,同等开发和训练量下,GPU卡数量可节省60%、运行效率可提升4倍。The current challenges facing the field of artificial intelligence include high prices and market supply disruptions.The traditional single-card,single-use model results in low resource utilization and efficiency.Furthermore,existing technological research methods make it difficult to support the efficient management and scheduling of diverse heterogeneous GPU resources.Based on this,a virtualization-based GPU heterogeneous resource pool platform was proposed.Firstly,the overall architecture,logical architecture,and functional architecture of the platform were planned and designed.Secondly,key technologies were studied,and a virtualization heterogeneous GPU resource pool framework and a scheduling model based on time slicing+load balancing were proposed.Finally,based on the methods described,various innovative application models were proposed,including multiservice single-card stacking,cross-pull,cross-machine integration,hybrid deployment,and time division multiplexing.The research method proposed provides enterprise-level AI applications with GPU computing resources that are compatible with multiple GPU manufacturers,support remote access,flexible partitioning and aggregation,and flexible scheduling.Following the completion of calculations and an in-depth analysis,it has been demonstrated that a reduction of up to 60%in the number of GPU cards can be achieved while simultaneously enhancing operational efficiency by a factor of four.

关 键 词:GPU异构资源池 算力平台 虚拟化 时间切片 负载均衡 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象