IO dependent SSD cache allocation for elastic Hadoop applications 被引量：1

IO dependent SSD cache allocation for elastic Hadoop applications

作　　者：Zhen TANG Wei WANG Lei SUN Yu HUANG Heng WU Jun WEI Tao HUANG

机构地区：[1]State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China [2]University of Chinese Academy of Sciences, Beijing 100080, China [3]Tianjin Massive Data Processing Technology Laboratory, Tianjin Shenzhou General Data Technology Co., Ltd., Tianjin 300384, China [4]State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

出　　处：《Science China(Information Sciences)》2018年第5期51-67,共17页中国科学（信息科学）（英文版）

基　　金：supported by National Key Research and Development Program of China (Grant No. 2016YFB1000103);National Natural Science Foundation of China (Grant No. 61572480);Tianjin Massive Data Processing Technology Laboratory, and Youth Innovation Promotion Association, Chinese Academy of Sciences (Grant No. 2015088)

摘　　要：Elastic Hadoop applications consisting of multiple virtual machines （VMs） are widely used to support big data analysis and processing. In this scenario, flash-based solid state drive （SSD） is usually deployed on hypervisors and used as the cache to improve the IO performance. However, existing SSD caching schemes are mostly VM-centric, which focus on the low-level IO performance metrics of individual VMs. They may not lead to the optimized performance of elastic Hadoop applications, i.e., the job completion time （JCT）, as the importance of VMs inside the application are different even though they have the similar low-level IO patterns. Considering the IO dependency among VMs and figuring out the importance, which we regard as the application-centric metrics, may potentially better improve the performance. We present IO dependency based requirement model, to characterize the requirement of SSD cache for each VM inside the elastic Ha^toop application, and then use it in a genetic algorithm （CA） based approach to calculate the nearly optimal weights of VMs for allocating the per-VM SSD cache space and the capacity of the I/O operations per second （IOPS）. Furthermore, we present a tool AC-SSD based on the approach and introduce the closed-loop adaptation to react to continuously changing workloads. The evaluation shows that by using AC-SSD, the JCT is reduced by up to 39% for IO sensitive workloads, up to 29% for continuously changing workloads, and over 12.5% for different scale of data comparing to the shared cache.Elastic Hadoop applications consisting of multiple virtual machines （VMs） are widely used to support big data analysis and processing. In this scenario, flash-based solid state drive （SSD） is usually deployed on hypervisors and used as the cache to improve the IO performance. However, existing SSD caching schemes are mostly VM-centric, which focus on the low-level IO performance metrics of individual VMs. They may not lead to the optimized performance of elastic Hadoop applications, i.e., the job completion time （JCT）, as the importance of VMs inside the application are different even though they have the similar low-level IO patterns. Considering the IO dependency among VMs and figuring out the importance, which we regard as the application-centric metrics, may potentially better improve the performance. We present IO dependency based requirement model, to characterize the requirement of SSD cache for each VM inside the elastic Ha^toop application, and then use it in a genetic algorithm （CA） based approach to calculate the nearly optimal weights of VMs for allocating the per-VM SSD cache space and the capacity of the I/O operations per second （IOPS）. Furthermore, we present a tool AC-SSD based on the approach and introduce the closed-loop adaptation to react to continuously changing workloads. The evaluation shows that by using AC-SSD, the JCT is reduced by up to 39% for IO sensitive workloads, up to 29% for continuously changing workloads, and over 12.5% for different scale of data comparing to the shared cache.

关键词：HADOOP SSD CACHE resource management VIRTUALIZATION

分类号：TP317[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

IO dependent SSD cache allocation for elastic Hadoop applications 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

IO dependent SSD cache allocation for elastic Hadoop applications 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索