基于新型存储器件的分布式文件系统性能优化被引量：6

Performance optimization of distributed file system based on new type storage devices

作　　者：董聪[1,2] 张晓程文迪[2,3] 石佳[1,2] DONG Cong;ZHANG Xiao;CHENG Wendi;SHI Jia(School of Software,Northwestern Polytechnical University,Xi’an Shaanxi 710129,China;Key Laboratory of Big Data Storage and Management,Ministry of Industry and Information Technology(Northwestern Polytechnical University),Xi’an Shaanxi 710129,China;College of Computer Science,Northwestern Polytechnical University,Xi’an Shaanxi 710129,China)

机构地区：[1]西北工业大学软件学院,西安710129 [2]大数据存储与管理工业和信息化部重点实验室(西北工业大学),西安710129 [3]西北工业大学计算机学院,西安710129

出　　处：《计算机应用》2020年第12期3594-3603,共10页journal of Computer Applications

基　　金：国家重点研发计划项目(2018YFB1004400);北京市自然科学基金-海淀原始创新联合基金资助项目(L192027)。

摘　　要：新型存储器件的I/O性能通常比传统固态驱动器(SSD)高一个数量级,然而使用新型存储器件的分布式文件系统相对于使用SSD的分布式文件系统性能并没有显著的提高,这说明目前的分布式文件系统并不能充分发挥新型存储器件的性能。针对这个问题,对Hadoop分布式文件系统(HDFS)的数据写入流程及传输过程进行了量化分析。通过量化分析HDFS数据写入过程各阶段的时间开销,发现在写入数据的各个阶段中,节点间数据传输的时间占比较大。因此提出了对应的优化方案,通过异步写入的方式并行化数据传输与处理过程,使得不同数据包的处理阶段叠加起来,减少了数据包整体的处理时间,从而提升了HDFS的写入性能。实验结果表明,所提方案将HDFS的写入吞吐量提升了15%~24%,总体的写入执行时间降低了28%~36%。The I/O performance of new type storage devices is usually an order of magnitude higher than that of traditional Solid State Disk(SSD).However,simply replacing SSD with new type storage device will not significantly improve the performance of distributed file system.This means that the current distributed file system cannot give full play to the performance of new type storage devices.To solve the problem,the data writing process and transmission process of Hadoop Distributed File System(HDFS)were analyzed quantitatively.Through quantitative analysis of the time consumptions of different stages of HDFS writing process,the most time-consuming data transmission between nodes was found in each stage of writing data.Therefore,the corresponding optimization strategy was proposed,that is,the processes of data transmission and processing were parallelized by using asynchronous write.So that the processing stages of different data packets were parallel to each other,shortening the total processing time of data writing,thereby the write performance of HDFS was improved.Experimental results show the proposed scheme improves the HDFS write throughput by 15%-24%,and reduces the overall write execution time by 28%-36%.

关键词：分布式文件系统 HADOOP分布式文件系统非易失性存储器性能优化异步写入

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于新型存储器件的分布式文件系统性能优化被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于新型存储器件的分布式文件系统性能优化 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于新型存储器件的分布式文件系统性能优化被引量：6