检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:董聪[1,2] 张晓 程文迪[2,3] 石佳[1,2] DONG Cong;ZHANG Xiao;CHENG Wendi;SHI Jia(School of Software,Northwestern Polytechnical University,Xi’an Shaanxi 710129,China;Key Laboratory of Big Data Storage and Management,Ministry of Industry and Information Technology(Northwestern Polytechnical University),Xi’an Shaanxi 710129,China;College of Computer Science,Northwestern Polytechnical University,Xi’an Shaanxi 710129,China)
机构地区:[1]西北工业大学软件学院,西安710129 [2]大数据存储与管理工业和信息化部重点实验室(西北工业大学),西安710129 [3]西北工业大学计算机学院,西安710129
出 处:《计算机应用》2020年第12期3594-3603,共10页journal of Computer Applications
基 金:国家重点研发计划项目(2018YFB1004400);北京市自然科学基金-海淀原始创新联合基金资助项目(L192027)。
摘 要:新型存储器件的I/O性能通常比传统固态驱动器(SSD)高一个数量级,然而使用新型存储器件的分布式文件系统相对于使用SSD的分布式文件系统性能并没有显著的提高,这说明目前的分布式文件系统并不能充分发挥新型存储器件的性能。针对这个问题,对Hadoop分布式文件系统(HDFS)的数据写入流程及传输过程进行了量化分析。通过量化分析HDFS数据写入过程各阶段的时间开销,发现在写入数据的各个阶段中,节点间数据传输的时间占比较大。因此提出了对应的优化方案,通过异步写入的方式并行化数据传输与处理过程,使得不同数据包的处理阶段叠加起来,减少了数据包整体的处理时间,从而提升了HDFS的写入性能。实验结果表明,所提方案将HDFS的写入吞吐量提升了15%~24%,总体的写入执行时间降低了28%~36%。The I/O performance of new type storage devices is usually an order of magnitude higher than that of traditional Solid State Disk(SSD).However,simply replacing SSD with new type storage device will not significantly improve the performance of distributed file system.This means that the current distributed file system cannot give full play to the performance of new type storage devices.To solve the problem,the data writing process and transmission process of Hadoop Distributed File System(HDFS)were analyzed quantitatively.Through quantitative analysis of the time consumptions of different stages of HDFS writing process,the most time-consuming data transmission between nodes was found in each stage of writing data.Therefore,the corresponding optimization strategy was proposed,that is,the processes of data transmission and processing were parallelized by using asynchronous write.So that the processing stages of different data packets were parallel to each other,shortening the total processing time of data writing,thereby the write performance of HDFS was improved.Experimental results show the proposed scheme improves the HDFS write throughput by 15%-24%,and reduces the overall write execution time by 28%-36%.
关 键 词:分布式文件系统 HADOOP分布式文件系统 非易失性存储器 性能优化 异步写入
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.4