存储系统重复数据删除技术研究综述  被引量:27

Survey on Data Deduplication Techniques for Storage Systems

在线阅读下载全文

作  者:谢平[1,2] 

机构地区:[1]青海师范大学计算机学院,西宁810008 [2]华中科技大学计算机科学与技术学院,武汉430074

出  处:《计算机科学》2014年第1期22-30,42,共10页Computer Science

基  金:国家973重点基础研究发展计划(2011CB302303)资助

摘  要:目前企业对数据量不断增长的需求使得数据中心面临严峻的挑战。研究发现,存储系统中高达60%的数据是冗余的,如何缩减存储系统中的冗余数据受到越来越多科研人员的关注。重复数据删除技术利用CPU计算资源,通过数据块指纹对比能够有效地减少数据存储空间,已成为工业界和学术界研究的热点。在分析和总结近10年重复数据删除技术文献后,首先通过分析卷级重删系统体系结构,阐述了重删系统的原理、实现机制和评价标准。然后结合数据规模行为对重删系统性能的影响,重点分析和总结了重删系统的各种性能改进技术。最后对各种应用场景的重删系统进行对比分析,给出了4个需要重点研究的方向,包括基于主存储环境的重删方案、基于分布式集群环境的重删方案、快速指纹查询优化技术以及智能数据检测技术。With the ever-increasing data volume in enterprises, the needs of massive data storage capacity currently be- come a grand challenge in data centers, and researching shows that there are about 60% redundant data in storage sys- tems. Therefore,the problems of high redundancy in data storage systems are paid much more attentions by resear- chers. Exploiting CPU resource to compare the data block's fingerprint which is unique, data deduplication techniques can efficiently accomplish data reduction in storage systems, thus data deduplication techniques have become a hot topic in both industry and academia fields. Based on adequately analyzing and summarizing literatures on data deduplication techniques appeared in recent ten years,this paper first presented the principle of representative data deduplication sys- tems, implementation mechanisms as well as evaluation methodologies after analyzing volume-level data deduplication system architecture. Second, we also focused on existing deduplication optimizing techniques with consideration of both the characteristics of data and scale of data deduplication systems. Finally four new research directions were given as follows by comparatively analyzing various application scenarios of data deduplication systems, including research of pri- mary-Storage-Level data deduplication approaches, research of distributed data deduplication scheme for clustered stor- age systems, research of highly-efficient fingerprint searching techniques and research of intelligent data detection tech- niques.

关 键 词:重复数据删除 重删率 体系结构 元数据结构 I O优化 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象