面向Spark的批处理应用执行时间预测模型被引量：1

Prediction Model of Execution Time for Batch Application in Spark

作　　者：李硕梁毅 LI Shuo;LIANG Yi(Faculty of Information,Beijing University of Technology,Beijing 100124,China)

出　　处：《计算机工程与应用》2021年第5期79-87,共9页Computer Engineering and Applications

基　　金：国家重点研发计划(2017YFC0803300);国家自然科学基金面上项目(91546111)。

摘　　要：Spark批处理应用执行时间预测是指导Spark系统资源分配、应用均衡的关键技术。然而,既有研究对于具有不同运行特征的应用采用统一的预测模型,且预测模型考虑因素较少,降低了预测的准确度。针对上述问题,提出了一种考虑了应用特征差异的Spark批处理应用执行时间预测模型,该模型基于强相关指标对Spark批处理应用执行时间进行分类,对于每一类应用,采用PCA和GBDT算法进行应用执行时间预测。当即席应用到达后,通过判断其所属应用类别并采用相应的预测模型进行执行时间预测。实验结果表明,与采用统一预测模型相比,提出的方法可使得预测结果的均方根误差和平均绝对百分误差平均降低32.1%和33.9%。The prediction of execution time for batch application in Spark is the key technology to guide the resource allocation and application balance of Spark.However,the existing work adopts an unified prediction model for application with different behavior characteristics and considers limited factors in the model learning,which reduces the accuracy of prediction.In order to solve the above problems,an execution time prediction model for Spark batch application is proposed,which considers the diversity of batch application’s behavior characteristics.The model first classifies the execution time of Spark batch application based on strong-correlated metrics,and then uses PCA and GBDT algorithms to predict the execution time for each application category.Finally,when the ad-hoc application arrives,it is mapped into a specific application category and its execution time is predicted with the corresponding prediction model.The experimental results show that,compared with the unified prediction model,the proposed method can reduce the mean square root error and the mean absolute percentage error of the prediction results by 32.1%and 33.9%on average.

关键词：SPARK 批处理应用分类预测

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向Spark的批处理应用执行时间预测模型被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向Spark的批处理应用执行时间预测模型 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向Spark的批处理应用执行时间预测模型被引量：1