检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙震宇[1] 石京燕[1] 姜晓巍[1] 邹佳恒[1] 杜然[1]
出 处:《计算机科学》2017年第10期85-90,共6页Computer Science
基 金:国家自然科学基金项目(11475210)资助
摘 要:高能物理数据由物理事例组成,事例之间没有相关性。可以通过大量作业同时处理大量不同的数据文件,从而实现高能物理计算任务的并行化,因此高能物理计算是典型的高吞吐量计算场景。高能所计算集群使用开源的TORQUE/Maui进行资源管理及作业调度,并通过将集群资源划分成不同队列以及限制用户最大运行作业数来保证公平性,然而这也导致了集群整体资源利用率非常低下。SLURM和HTCondor都是近年来流行的开源资源管理系统,前者拥有丰富的作业调度策略,后者非常适合高吞吐量计算,二者都能够替代老旧、缺乏维护的TORQUE/Maui,都是管理计算集群资源的可行方案。在SLURM和HTCondor测试集群上模拟大亚湾实验用户的作业提交行为,对SLURM和HTCondor的资源分配行为和效率进行了测试,并与相同作业在高能物理研究所TORQUE/Maui集群上的实际调度结果进行了对比,分析了SLURM及HTCondor的优势和不足,探讨了使用SLURM或HTCondor管理高能物理研究所计算集群的可行性。High energy physics data consist of multiple events,among which there is no relativity.A high energy physics computing mission is parallelized by running multiple jobs processing multiple different data files simultaneously.Therefore,high energy physics computing is a typical high throughput computing scenario.The computer cluster running at the institute of high energy physics(IHEP)uses the open-source TORQUE/Maui for resource management and job scheduling.IHEP keeps a fair-use policy by dividing the computing resources of this cluster into multiple queues,and limiting the maximum number of running jobs of each user.However,this leads up to a low overall resource usage of the cluster.SLURM and HTCondor are both popular open-source resource management system.SLURM has plenty of job scheduling policy,while HTCondor well suits high throughput computing.Both of them are the possible solutions of resource management for computer clusters,replacing old,lack-of-service TORQUE/Maui.In this paper,job submission behavior of users from Daya Bay experiment was simulated at SLURM and HTCondor testing cluster,testing the resource allocation behaviors and efficiencies of SLURM and HTCondor.Their scheduling results were then compared with the actual scheduling result of the same jobs on IHEP TORQUE/Maui cluster.Finally the strengths and weaknesses of SLURM and HTCondor were analyzed,and the practicability of using SLURM or HTCondor to manage the IHEP computer cluster was discussed.
关 键 词:资源管理系统 作业调度器 计算集群 高吞吐量计算 高能物理计算
分 类 号:TP319[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.135.39