基于内存优化配置的MapReduce性能调优  被引量:6

MapReduce Job Performance Tuning by Optimizing Memory Configurations

在线阅读下载全文

作  者:罗永刚[1] 陈兴蜀[1] 杨露[1] 

机构地区:[1]四川大学网络空间安全研究院,四川成都610065

出  处:《华南理工大学学报(自然科学版)》2017年第1期102-111,共10页Journal of South China University of Technology(Natural Science Edition)

基  金:国家科技支撑计划项目(2012BAH18B05);国家自然科学基金资助项目(61272447)~~

摘  要:MapReduce作业性能与内存配置存在极大的相关性,针对准确预测作业内存困难问题,根据Java虚拟机(JVM)的分代内存管理特点,提出了一种分代内存预测方法.首先使用回归模型对年轻代与垃圾回收平均时间的关系进行建模,将寻找合理年轻代内存大小的问题转换为一个受约束的非线性优化问题,并设计搜索算法来求解该优化问题.文中还建立MapReduce作业的Map任务和Reduce任务性能与内存的关系模型,求解最佳性能的内存需求,从而获得Map任务和Reduce任务的年长代内存大小;使用聚类算法预测JVM晋升对象阈值,优化JVM配置,减少了JVM的垃圾回收暂停时间.实验结果表明,文中提出的方法能准确预测作业的内存需求,显著提升作业运行性能.MapReduce job performance depends heavily on memory configurations. In order to overcome the diffi-culty in predicting the memory requirement of MapReduce jobs, on the basis of the fact that Java Virtual Machine (JVM) divides the heap space managed by JVM Garbage Collector into young and old generations, a generational memory prediction method is proposed. In the method, first, a regression model to resolve average garbage collec-tion time for a given young generation size is constructed. Then, the problem of looking for the rational size of young generation is converted into a constrained nonlinear optimization problem, and a fixed-size search algorithm is de-signed to solve the optimization problem. Moreover, memory models of the Map and Reduce tasks of MapReduce jobs are constructed to solve the memory requirement of optimal performance, thus obtaining reasonable old genera-tion memory size of the Map and Reduce tasks. Finally, a A:-means clustering algorithm is used to predict the value of parameter PretenureSizeThreshold, and JVM configurations are tuned to reduce garbage collection pause time. Experimental results show that the proposed method can accurately predict the memory requirements of the Map and Reduce tasks of MapReduce jobs, and it can significantly improve job performance.

关 键 词:大数据 MAPREDUCE 垃圾回收 内存分配 性能优化 

分 类 号:TP393.09[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象