面向大数据分析作业的启发式云资源供给方法  被引量:13

Heuristic Based Resource Provisioning Approach for Big Data Analytics in Cloud Environment

在线阅读下载全文

作  者:吴悦文 吴恒[1] 任杰 张文博[1] 魏峻[1,2,3] 王焘 钟华[1,2,3] WU Yue-Wen;WU Heng;REN Jie;ZHANG Wen-Bo;WEI Jun;WANG Tao;ZHONG Hua(Technology Center of Software Engineering,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China;Science&Technology on Integrated Information System Laboratory(Institute of Software,Chinese Academy of Sciences),Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院软件研究所软件工程技术中心,北京100190 [2]天基综合信息系统重点实验室(中国科学院软件研究所),北京100190 [3]中国科学院大学,北京100049

出  处:《软件学报》2020年第6期1860-1874,共15页Journal of Software

基  金:国家重点研发计划(2017YFB1400804);北京市自然科学基金(4182070);蚂蚁金服科研基金(XZ502017000730);中国科学院青年创新促进会人才专项(2018144)。

摘  要:云计算已成为大数据分析作业的主流运行支撑环境,选择合适的云资源优化其性能面临巨大挑战.当前研究主要考虑大数据分析框架(如Hadoop,Spark等)的多样性,采用机器学习方法进行资源供给,但样本少容易陷入局部最优解.提出了大数据环境下基于负载分类的启发式云资源供给方法RP-CH,基于云资源共享特点,获取其他大数据分析作业的运行时监测和云资源配置信息,建立负载分类与优化云资源配置的启发式规则,并将该规则作用到贝叶斯优化算法的收益函数.基于HiBench,SparkBench测试基准的结果显示:RP-CH相对于已有方法CherryPick、大数据分析作业的性能平均提升了58%,成本平均减少了44%.It is a big challenge to pick up the best cloud configuration for recurring big data analytics jobs running in clouds.Prior efforts may get in a sub-optimal configuration due to a broad spectrum of cloud configurations with a few test runs,such as CherryPick.RP-CH,presented in this paper,is a resource provisioning system that leverages heuristic rules based on classification information to identify the optimal cloud configuration for big data analytics jobs,while the insight is classifying a job by comparing its resource preference and usage information with other jobs.Then,heuristic rules are used to distinguish bad samples from good ones in Bayesian optimization algorithm.The experiments on HiBench and SparkBench in Aliyun ECS show that the performance of job has been improved by 58%in average comparing with CherryPick,meanwhile the resource cost has been reduced by 44%in average.

关 键 词:大数据分析 云计算 启发式 云资源供给 贝叶斯优化 

分 类 号:TP316[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象