基于遗传算法的Spark中间结果数据迁移策略被引量：1

Spark Intermediate Result Data Migration Strategy Based on Genetic Algorithm

作　　者：梁毅[1] 陈金栋苏超毕临风 LIANG Yi;CHEN Jin-dong;SU Chao;BI Ling-feng(Computer Academy,Beijing University of Technology,Beijing 100124,China)

机构地区：[1]北京工业大学计算机学院,北京100124

出　　处：《软件导刊》2020年第4期89-92,共4页Software Guide

基　　金：国家自然科学基金项目(91646201,91546111);国家重点研发计划项目(2017YFC0803300)。

摘　　要：Spark是大数据内存计算系统的典型代表,通过内存缓存数据加速迭代型、交互型大数据应用的运行。基于时间窗口的数据分析是一类典型的大数据迭代型应用。基于Spark平台运行时间窗口数据分析应用,存在中间结果数据放置不均的问题,造成应用执行效率降低。针对上述问题,提出基于遗传算法的Spark中间结果数据迁移策略,通过考虑中间结果数据迁移时机、迁移数据规模,并使用遗传算法优化选取迁移数据放置位置,提高时间窗口应用执行效率。实验结果表明,在既有Spark平台中,采用该迁移策略可使时间窗口应用执行时间最大减少28.45%,平均减少21.59%。Spark is a typical representative of big data memory computing system.It accelerates the operation of iterative,interactive and other big data applications through the memory-based data cache.Data analysis based on time window is a typical big data iterative application.Data analysis application based on Spark platform's runtime window has the problem of uneven placement of intermediate result data,which reduces the efficiency of application execution.To solve the above problems,this paper proposes Spark intermediate results data migration strategy based on genetic algorithm.By considering the migration timing and data scale of intermediate results data,and using genetic algorithm to optimize the selection of the location of migrated data,the execution efficiency of time window application is improved.Experiments show that on the existing Spark platform,by using the proposed intermediate results data migration strategy,it can reduce the maximum execution time of time window applications by 28.45%and the average by 21.59%.

关键词：SPARK 中间结果数据数据迁移

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于遗传算法的Spark中间结果数据迁移策略被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于遗传算法的Spark中间结果数据迁移策略 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于遗传算法的Spark中间结果数据迁移策略被引量：1