Using multi-threads to hide deduplication I/O latency with low synchronization overhead 被引量：1

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

机构地区：[1]School of Computer,Huazhong University of Science and Technology [2]Wuhan National Lab for Optoelectronics

出　　处：《Journal of Central South University》2013年第6期1582-1591,共10页中南大学学报（英文版）

基　　金：Project(IRT0725)supported by the Changjiang Innovative Group of Ministry of Education,China

摘　　要：Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/0 intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/0 latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/0 latency in multi-core systems, multi-threaded deduplication （Multi-Dedup） architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/0 latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.Data deduplication,as a compression method,has been widely used in most backup systems to improve bandwidth and space efficiency.As data exploded to be backed up,two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency.However,CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors;the I/O latency is likely becoming the bottleneck in data deduplication.To alleviate the challenge of I/O latency in multi-core systems,multi-threaded deduplication (Multi-Dedup) architecture was proposed.The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency.A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead.On the other hand,a collisionless cache array was also designed to preserve locality and similarity within the parallel threads.In various real-world datasets experiments,Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods.In addition,Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.

关键词：MULTI-THREAD MULTI-CORE parallel data deduplication

分类号：TP309.3[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Using multi-threads to hide deduplication I/O latency with low synchronization overhead 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Using multi-threads to hide deduplication I/O latency with low synchronization overhead 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索