基于范德蒙码的HDFS优化存储策略研究  被引量:18

Optimized Storage Strategy Research of HDFS Based on Vandermonde Code

在线阅读下载全文

作  者:宋宝燕[1] 王俊陆[1] 王妍[1] 

机构地区:[1]辽宁大学信息科学与技术学院,沈阳110036

出  处:《计算机学报》2015年第9期1825-1837,共13页Chinese Journal of Computers

基  金:国家自然科学基金(61472169;60873068);辽宁省教育厅优秀人才支持计划项目基金(LR201017)资助~~

摘  要:随着大数据时代的到来,新型文件系统HDFS(Hadoop分布式文件系统)的应用越来越广泛.但其本身也存在着整体存储成本过高、可扩展性低、节点负载均衡能力不足等问题.因此,该文提出了一种基于范德蒙码的HDFS分散式动态副本存储优化策略,针对HDFS大多部署在大量的廉价硬件集群上的实际情况,在范德蒙码优化策略的基础上,采用分散式动态副本控制的思想对HDFS文件操作的计算过程、计算模式以及译码触发策略进行系统的改进,并通过校验码动态设置的方式将容错度控制在一个理想的范围之内,此外,结合伽罗华有限域理论对范德蒙码的编译码操作及计算方法进行全面优化,在不影响HDFS存储结构的前提下,降低了范德蒙码编译码的时间代价和计算的内存压力,节约了HDFS约30%的存储开销,数据可靠性提高了约200%,均衡HDFS系统节点负载能力,译码恢复效率平均提升约40%,形成了一套完整的、系统的优化方案,为未来HDFS的发展提供了一条有效途径.With the arrival of the era of big data, the application of the new file management architecture HDFS (Hadoop Distributed File System) is more and more widely. But it also having many problems like the overall storage costs too much, the extensibility is low, the nodes load balance ability is insufficient and so on. So this paper proposes an Optimized Storage Strategy of HDFS Based on Vandermonde Code, according to the actual situation, which the HDFS are deployed in a large number of inexpensive hardware clusters, it uses the thought of decentralized dynamic replication control to optimize the calculation process, calculation mode and decoding trigger strategy of HDFS file operations comprehensively based on Vandermonde Code optimiza- tion strategy, and uses the dynamic setting check code to control the fault tolerant in a desirable range. Besides, it uses Galois finite field theory to optimize the encoding and decoding operation of Vandermonde Code and calculation method comprehensively. Under the premise of without affecting storage structure of HDFS, it reduces time cost and the calculation memory pressure of Vandermonde Code, reduces about 30 % of the storage cost, increases about 2000% of the reliability of HDFS, balances the load of system, increases about 40% of the decoding recovery efficiency, formed a set of complete and systematic optimization solution, providesan effective way fordevelopment of HDFS in the future.

关 键 词:大数据 HDFS 范德蒙码 分散式动态副本 优化存储 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象