Hadoop备份数据存放策略的改进  被引量:3

Improvement of Backup Data Placement Policy of Hadoop

在线阅读下载全文

作  者:周长俊 宗平[2] ZHOU Chang-jun;ZONG Ping(School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Overseas Education,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)

机构地区:[1]南京邮电大学计算机学院,江苏南京210003 [2]南京邮电大学海外教育学院,江苏南京210023

出  处:《计算机技术与发展》2019年第1期11-16,共6页Computer Technology and Development

基  金:国家"863"高技术发展计划项目(2006AA01Z208);江苏省高校自然科学基础研究项目(06KJB520079)

摘  要:对于默认的Hadoop备份数据存放策略来说,一旦本地的数据副本发生失效,那么就需通过远端机架上存放的备份数据来实现恢复,而对于默认的备份数据存放策略,备份数据存放节点的选择具有随机性,那么可能带来的问题是不同节点间备份数据存放不均衡,数据恢复时由于距离的因素造成内部带宽的巨大消耗。针对上述问题,提出一种改进的备份数据存放策略。该策略将节点之间的距离,节点的负载以及备份数据恢复次数纳入节点选择的考虑范围,由此计算出每个节点的匹配度,随之选出匹配度最高的节点作为远端机架间的备份数据存放的最优节点。该策略不但实现了节点间备份数据放置的负载均衡,而且兼顾了数据恢复时消耗的内部带宽,将数据副本失效次数纳入考虑,实现了经常失效数据副本的快速恢复。通过在Hadoop平台上实现所提出的改进策略,结果达到了预期的要求。On the topic of the default Hadoop backup data storage strategy,once the local data copy fails,backup data stored in the remote rack should be used to restore.However,for the default backup data storage strategy,the choice of storage nodes is random,so the problem that may arise is that backup data is stored unevenly among different nodes,and the internal bandwidth is greatly consumed due to the distance when data is recovered.In order to solve these problems,we propose an improved backup data storage strategy.The strategy considers the distance between nodes,the load of nodes and the number of backup data recovery into consideration,and calculates the matching degree of each node.Thus node with the highest matching degree is selected as the optimal node for storing the backup data between the remote racks.This strategy not only realizes the load balancing of backup data placement between nodes,but also takes the internal bandwidth consumed during data recovery into account,besides that it covers the number of data copy failures and achieve rapid recovery of frequently failed data copies.By implementing the proposed improvement strategy on the Hadoop platform,the results meet the expected requirements.

关 键 词:HADOOP 备份数据存放策略 内部带宽 负载均衡 热点数据 

分 类 号:TP31[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象