检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张万才 张楠 杨文清 王涛 张文强 ZHANG Wancai;ZHANG Nan;YANG Wenqing;WANG Tao;ZHANG Wenqiang(Nari Technology Development Co.,Ltd.,Nanjing 211100,China)
机构地区:[1]国电南瑞科技股份有限公司,江苏南京211100
出 处:《电信科学》2024年第9期162-175,共14页Telecommunications Science
基 金:国家电网公司科技项目(No.524608210272)。
摘 要:人工智能算力资源面临价格高昂、市场断供等现状问题,传统的单卡单用模式导致资源利用率和使用效率低下,现有的技术研究手段难以支撑多元异构图形处理单元(graphics processing unit,GPU)资源的高效管理和调度。基于此,提出一种基于虚拟化的GPU异构资源池平台,首先对平台总体架构、逻辑架构和功能架构进行了规划设计;其次,对关键技术进行研究,提出了虚拟化异构GPU资源池框架和基于时间切片+负载均衡的调度模型;最后,基于所提方法,提出了多业务单卡叠加、交叉拉远、跨机整合、混合部署和时分复用等多种创新应用模式。所提方法为企业级AI应用提供了可兼容多个GPU不同厂商、支持远程访问、可灵活切分和聚合、可弹性调度的GPU算力资源。经测算分析,同等开发和训练量下,GPU卡数量可节省60%、运行效率可提升4倍。The current challenges facing the field of artificial intelligence include high prices and market supply disruptions.The traditional single-card,single-use model results in low resource utilization and efficiency.Furthermore,existing technological research methods make it difficult to support the efficient management and scheduling of diverse heterogeneous GPU resources.Based on this,a virtualization-based GPU heterogeneous resource pool platform was proposed.Firstly,the overall architecture,logical architecture,and functional architecture of the platform were planned and designed.Secondly,key technologies were studied,and a virtualization heterogeneous GPU resource pool framework and a scheduling model based on time slicing+load balancing were proposed.Finally,based on the methods described,various innovative application models were proposed,including multiservice single-card stacking,cross-pull,cross-machine integration,hybrid deployment,and time division multiplexing.The research method proposed provides enterprise-level AI applications with GPU computing resources that are compatible with multiple GPU manufacturers,support remote access,flexible partitioning and aggregation,and flexible scheduling.Following the completion of calculations and an in-depth analysis,it has been demonstrated that a reduction of up to 60%in the number of GPU cards can be achieved while simultaneously enhancing operational efficiency by a factor of four.
关 键 词:GPU异构资源池 算力平台 虚拟化 时间切片 负载均衡
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90