检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]黄河水利职业技术学院 [2]河南大学
出 处:《电子测量与仪器学报》2018年第3期135-141,共7页Journal of Electronic Measurement and Instrumentation
基 金:河南省科技攻关项目(172102210385);河南省高等学校重点科研项目(15A520023);河南省教育厅人文社科项目(2017-ZZJH-340);开封市科技攻关计划项目(1703002)资助
摘 要:针对网格计算环境中可能出现各种故障的问题,提出了一种在线分布式容错作业调度算法,算法由2个主要算法模块构成,即作业调度和副本放置算法模块以及副本管理算法模块。一方面,前者基于作业副本思想即每个副本在不同的站点被独立调度,因而可以把这些未被充分利用的闲置资源用来运行作业副本,以使至少有一个副本会成功完成;另一方面,后者使得运行一个作业副本的每个远程单独资源管理器(SRM)将在每个监控间隔把作业副本的状态通知给原始SRM(PSRM),PSRM定期检查应用状态表,然后查询全部远程SRM来获得计算机器和网络状态,完成对站点内运行的任何作业副本的健康情况的监控,从而实现容错功能。实验结果表明,在线分布式容错作业调度算法相比于其他的网格容错调度算法和非容错调度算法,在各种情形的故障率下,都能取得较好的作业平均响应时间。Aiming at all sorts of fault problem arising in the grid computing environment,an online distributed fault-tolerant scheduling algorithm for jobs is proposed in this paper. The algorithm consists of two main algorithm modules,namely,job scheduling and replica placement algorithm module and replica management algorithm module. On the one hand,the former is based on job replication strategy in which each job is independently scheduled at different sites,and thus those underutilized resources can be used to run job replication such that at least one job replica will be completed successfully. On the other hand,the latter enableseach remote SRM running a replica of a job to inform the status of the job replica to the primary SRM at each monitor interval. The primary SRM periodically checks the application status table,and then queries all remote SRMs to obtain computing machine and network status to monitor the health status of any job replicas running within the site so as to complete the function of fault tolerance. The experimental results show that the online distributed fault-tolerant scheduling algorithm for jobs proposed in this paper can achieve better performance in terms of job average response time in various types of fault rate compared with other fault-tolerant scheduling algorithm and non fault-tolerant scheduling algorithm in grid computing.,
关 键 词:网格计算环境 调度算法 容错性 故障率 作业平均响应时间
分 类 号:TP393.02[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15