基于混合内存的Apache Spark缓存系统实现与优化  被引量:4

Implementation and Optimization of Apache Spark Cache System Based on Mixed Memory

在线阅读下载全文

作  者:魏森 周浩然 胡创[1] 程大钊 WEI Sen;ZHOU Haoran;HU Chuang;CHENG Dazhao(School of Computer Science,Wuhan University,Wuhan 430072,China)

机构地区:[1]武汉大学计算机学院,武汉430072

出  处:《计算机科学》2023年第6期10-21,共12页Computer Science

基  金:之江实验室开放课题(K2022PI0AB01);湖北珞珈实验室专项基金资助项目(220100016)。

摘  要:随着大数据时代数据规模的激增,内存计算框架得到了长足发展。主流内存计算框架Apache Spark使用内存来缓存中间结果,大幅度地提升了数据处理速度。同时,具有较快的读写速度和较大容量的非易失性存储器NVM在内存计算领域展现出了巨大的发展前景,使用DRAM和NVM构建Spark混合缓存系统成为一种可行方案。文中提出了一种基于DRAM-NVM混合内存的Spark缓存系统,该系统选择平面混合缓存模型作为设计方案,然后为缓存块管理系统设计了专用的数据结构,并提出了适用于Spark的混合缓存系统整体设计架构。另外,为了将频繁访问的缓存块保存在DRAM缓存中,提出了基于缓存块最小重用代价的混合缓存管理策略。首先从DAG信息中获取RDD的未来重用次数,未来重用次数多的缓存块将被优先保存在DRAM缓存中,并在缓存块迁移时考虑了迁移成本。设计实验表明,DRAM-NVM混合缓存相比原有缓存系统的性能平均提升了53.06%,对于相同的混合内存,所提策略相比默认缓存策略有平均35.09%的提升。同时,使用文中设计的混合系统只需要1/4的DRAM和3/4的NVM作为缓存,就能达到全部DRAM缓存约79%的性能表现。With increasing data scale in the“big data era”,in-memory computing frameworks have grown significantly.The mainstream in-memory computing framework Apache Spark uses memory to cache intermediate results,which greatly improves data processing performance.At the same time,non-volatile memory(NVM)with fast read and write performance has great development prospects in the field of in-memory computing,so there is huge promise in building Spark's cache with a mix of DRAM and NVM.In this paper,a Spark cache system based on DRAM-NVM hybrid memory is proposed,which selects the flat hybrid cache model as the design scheme,and then designs a dedicated data structure for the cache block management system,and proposes the overall design architecture of the hybrid cache system for Spark.In addition,in order to save frequently accessed cache blocks in the DRAM cache,a hybrid cache management strategy based on the minimum reuse cost of cache blocks is proposed.First,the future reuse of RDD is obtained from the DAG information,and the cache blocks with high future reuse times will be stored in the DRAM cache first,and the migration cost is considered when the cache block is migrated.The design experiments show that the DRAM-NVM hybrid cache has an average performance improvement of 53.06%compared to the original cache system,and the proposed strategy has an average improvement of 35.09%compared to the default cache strategy for the same hybrid memory.At the same time,the hybrid system designed in this paper only needs 1/4 of the DRAM and 3/4 of the NVM as the cache,and the running time of the total DRAM cache can be achieved by 85.49%.

关 键 词:SPARK 缓存管理策略 NVM 混合内存 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象