Spark迭代密集型应用的优化方法研究  被引量:3

Research on Optimization for Iteration-Intensive Applications on Spark

在线阅读下载全文

作  者:魏占辰 刘晓宇 黄秋兰[1] 孙功星[1] WEI Zhanchen;LIU Xiaoyu;HUANG Qiulan;SUN Gongxing(Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院高能物理研究所,北京100049 [2]中国科学院大学,北京100049

出  处:《计算机工程与应用》2020年第23期68-73,共6页Computer Engineering and Applications

基  金:国家自然科学基金(No.11775249,No.11875283)。

摘  要:Spark是一个非常流行且广泛适用的大数据处理框架,具有良好的易用性和可扩展性。但在实际应用中,仍然存在一些问题需要解决。例如在部分迭代计算场景中,得到的加速效果并不理想,究其原因在于使用Spark等分布式系统后引入的额外损耗较大。为准确分析并降低这些损耗,提出了Spark效率分析公式,以分布式计算代价衡量额外损耗,以有效计算比衡量执行效率。在此基础上,还针对Spark迭代密集型应用设计并实现了一种优化策略。测试结果表明,有效计算比和程序执行性能得到了大幅提升,其中有效计算比提升了约0.373,程序执行时间缩短了约68.2%。Spark is a very popular and widely applicable big data processing framework with good easy-using and scalability.However,there are still some problems that need to be solved in practical applications.For example,in some iterationintensive computing scenarios,the acceleration effect is not ideal.The reason is that the application efficiency is influenced by large additional loss introduced when using Spark.In order to accurately analyze and reduce these losses,this paper proposes a Spark efficiency formula.Additional losses are measured with the distributed calculation cost and application efficiency is measured with effective calculation ratio.This paper also proposes an optimization strategy for iteration-intensive applications on Spark according to the formula.Test results show that the effective calculation ratio has been greatly improved by about 0.373 and the execution time has been reduced by about 68.2%.

关 键 词:SPARK 迭代密集型应用优化 分布式计算代价 有效计算比 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象