面向纠删码存储集群的节点并发重构  被引量:1

Concurrent Node Reconstruction for Erasure-Coded Storage Clusters

在线阅读下载全文

作  者:黄建忠[1] 曹强[1] 黄思倜[1] 谢长生[1] 

机构地区:[1]武汉光电国家实验室(华中科技大学),武汉430074

出  处:《计算机研究与发展》2016年第9期1918-1929,共12页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61572209);国家"八六三"高技术研究发展计划基金项目(2013AA013203);国家"九七三"重点基础研究发展计划基金项目(2011CB302303)~~

摘  要:纠删码存储集群的一个关键设计目标是降低重构I/O所引起的网络流量,因为降低网络流量有助于缩短重构时间,进而提高可靠性.针对2个或多个失效节点并发重构这一研究话题,提出一种交叉式重构方案(interleaved reconstruction scheme,IRS).所有替换节点能协同、并行地重构所有失效分块.通过对现有集中式重构方案(centralized reconstruction scheme,CRec)和分散式重构方案(decentralizedreconstruction scheme,DRec)的I/O流进行分析,分析发现CRec中存储管理器和DRec中替换节点是重构性能的瓶颈.针对此,IRS从2个方面进行改进:1)替换节点充当重构节点进行并行式重构,消除CRec中管理器这一重构瓶颈;2)利用纠删码的编码结构特性,所有替换节点协同地重构所有失效分块,确保重构时只传输一次所需存活分块.在Reed-Solomon码存储集群上实现了上述3个重构方案,并用真实I/O trace进行对比测试.实验结果表明:当纠删码存储集群的编码参数为k=9和r=3时,IRS方案的双节点重构性能是其他2种重构方案的1.63倍;而3节点重构性能是其他2种重构方案的2.14倍.A key design goal of erasure-coded storage clusters is to minimize network traffic incurred by reconstruction I/Os, because reducing network traffic helps to shorten reconstruction time, which in turn leads to high system reliability. An interleaved reconstruction scheme (IRS) is proposed to address the issue of concurrently recovering two and more failed nodes. With analyzing the I/O flows of centralized reconstruction scheme (CRec) and decentralized reconstruction scheme (DRec), it is revealed that reconstruction performance bottleneck lies in the manager node for CRec and replacement nodes for DRec. IRS improves CRec and DRec from two aspects: 1) acting as rebuilding nodes, replacement nodes deal with reconstruction I/Os in a parallel manner, thereby bypassing the storage manager in CRec; 2) all replacement nodes collaboratively rebuild all failed blocks, exploiting structural properties of erasure codes to transfer each surviving block only once during the reconstruction process, and achieving high reconstruction I/O parallelism. The three reconstruction schemes (i.e., CRec, DRec, and IRS) are implemented under (k+r, k) Reed-Solomon-coded storage clusters where real-world I/O traces are replayed. Experimental results show that, under an erasure-coded storage cluster with parameters k=9 and r=3, IRS outperforms both CRec and DRec schemes in terms of reconstruction time by a factor of at least 1.63 and 2.14 for double-node and triple-node on-line reconstructions, respectively.

关 键 词:纠删编码 集群存储 存储可靠性 节点重构 交叉式重构 

分 类 号:TP333[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象