云环境下优化科学工作流执行性能的两阶段数据放置与任务调度策略  被引量:65

A Two-Step Data Placement and Task Scheduling Strategy for Optimizing Scientific Workflow Performance on Cloud Computing Platform

在线阅读下载全文

作  者:刘少伟[1] 孔令梅[2] 任开军[1] 宋君强[1] 邓科峰[1] 冷洪泽[1] 

机构地区:[1]国防科学技术大学计算机学院 [2]中国人民解放军78046部队

出  处:《计算机学报》2011年第11期2121-2130,共10页Chinese Journal of Computers

基  金:国家自然科学基金项目"动态网络环境下服务快速合成与优化执行的算法研究"(60903042);国家"八六三"高技术研究发展计划项目"基金地球系统模式一体化集成开发环境及示范应用研究"(863-2010AA012404)资助~~

摘  要:云环境中跨数据中心科学工作流的高效执行通常面临数据交互量大的问题.文中给出基于相关度的两阶段高效数据放置策略和任务调度策略:即在工作流建立阶段根据数据依赖关系图把关系紧密型数据集尽可能放置到同一数据中心;而后任务调度策略在运行阶段将任务调度到数据依赖最大的数据中心执行,并将新产生数据集放置到相关度最高的数据中心.实验表明,该策略能有效减少跨数据中心科学工作流执行时的数据传输量,从而能有效提升科学工作流的执行效率,并能减少资源的租赁费用.Scientific workflows in collaborative cloud environments are becoming more and more popular.There is an urgent need to address the problem of large amount of data transfer across geo-distributed data centers during workflow execution.By utilizing data dependencies,we propose a two-stage data placement strategy and a task scheduling strategy for efficient workflow execution.With our strategy,the most related datasets can be placed into the same data center based on the data dependence between them at workflow build-time;then the tasks are scheduled to their most closely related data centers for execution and the newly-generated data sets are put into the data center that has the most dependency with them at workflow runtime.The experimental results show that the proposed strategy can significantly reduce the volume of data transfer among different data centers,and hence improve the performance of running scientific workflows and cut down the cost of doing science on the clouds as well.

关 键 词:云计算 科学工作流 数据放置 数据相关 任务调度 

分 类 号:TP316[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象