检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:寇大治[1] 韦建文 唐小勇 KOU Dazhi;WEI Jianwen;TANG Xiaoyong(Shanghai Supercomputer Center,Shanghai 201203,China;Center for High Performance Computing,Shanghai Jiao Tong University,Shanghai 200240,China;School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha,Hunan 410114,China)
机构地区:[1]上海超级计算中心,上海201203 [2]上海交通大学,高性能计算中心,上海200240 [3]长沙理工大学,计算机与通信工程学院,湖南长沙410114
出 处:《数据与计算发展前沿》2022年第5期3-10,共8页Frontiers of Data & Computing
基 金:国家重点研发计划“基于应用的优化调度方法与实现”(2018YFB0204004)。
摘 要:【目的】在“东数西算”工程的大背景下,为了更好地实现对分布在不同地域超级计算机资源的调度管理,针对计算资源忙闲不均等问题,提出通过研究典型应用作业的运行特征,开发多中心任务的调度系统,以解决国家高性能计算环境统一调度的关键技术问题。【方法】首先收集了若干超级计算中心的应用运行历史情况,建立了应用运行历史数据库;其次将用户应用对资源的需求和典型应用的资源使用特征分析相结合,通过机器学习的方法,建立了一种可精确描述应用特征的框架;然后实现了跨集群高性能计算应用的容器方式迁移;最后研究了基于多中心应用特征的任务调度方法,开发了基于应用感知的全局资源优化调度系统。【结果】该系统为国家高性能计算环境服务化运营和稳定运行提供了有力的技术支撑。【结论】基于应用感知的算力优化调度方法可望有效提高“东数西算”的可靠性、可用性和可维护性。[Objective]Under the background of the project of“East-West Computing Requirement Transfer”,the super-computing resources distributed in different regions will be scheduled and managed.In order to avoid the problem of busy and unevenly distribution of computing resources,it is necessary to develop a multi-center task scheduling system by investigating the runtime characteristics of typical applications to achieve unified management of the national high-performance computing environment.[Methods]Firstly,the log data about application execution at several national supercomputing centers are collected and the database for the application log data is established.Secondly,by taking the user resources demand and the resource usage characteristics of typical applications into consideration,a machine learning framework is established to accurately depict the application execution features.Then migration of HPC applications across clusters using containers is implemented.Finally,a task scheduling system based on application-aware resource scheduling optimization is developed.[Results]This system provides powerful technical support for services and efficient operation of the national high-performance computing environment.[Conclusions]The application-aware method for computing power scheduling optimization is expected to effectively improve the reliability,availability,and maintainability of the“East-West Computing Requirement Transfer”project.
关 键 词:高性能计算系统 历史数据库 应用特征 算力调度方法
分 类 号:TP38[自动化与计算机技术—计算机系统结构] TP181[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3