A content aware chunking scheme for data de-duplication in archival storage systems  

A content aware chunking scheme for data de-duplication in archival storage systems

在线阅读下载全文

作  者:Nie Xuejun Qin Leihua Zhou Jingli 

机构地区:[1]College of Computer Science&Technology,Huazhong University of Science&Technology,Wuhan 430074,P.R.China

出  处:《High Technology Letters》2012年第1期45-50,共6页高技术通讯(英文版)

基  金:Supported by the National Natural Science Foundation of China (No.60673001) ; the State Key Development Program of Basic Research of China (No. 2004CB318203).

摘  要:Based on variable sized chunking, this paper proposes a content aware chunking scheme, called CAC, that does not assume fully random file contents, but tonsiders the characteristics of the file types. CAC uses a candidate anchor histogram and the file-type specific knowledge to refine how anchors are determined when performing de- duplication of file data and enforces the selected average chunk size. CAC yields more chunks being found which in turn produces smaller average chtmks and a better reduction in data. We present a detailed evaluation of CAC and the experimental results show that this scheme can improve the compression ratio chunking for file types whose bytes are not randomly distributed (from 11.3% to 16.7% according to different datasets), and improve the write throughput on average by 9.7%.

关 键 词:data de-duplicate content aware chunking (CAC) candidate anchor histogram (CAH) 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] O151.21[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象