适用于海量数据应用的多维Hash表结构  被引量:2

Multi-dimensional Hash table structure for massive data applications

在线阅读下载全文

作  者:吴泉源[1] 彭灿[1] 郑毅[1] 卜俊丽 WU Quanyuan PENG Can ZHENG Yi BU Jun(School of Computer, National University of Defense Technology, Changsha 410073, China)

机构地区:[1]国防科技大学计算机学院,长沙410073

出  处:《清华大学学报(自然科学版)》2017年第6期586-590,共5页Journal of Tsinghua University(Science and Technology)

摘  要:传统的Hash表通过对目标数据进行Hash计算,可以实现数据的快速存取与检索。为了保持较好的存储性能,需要使整个Hash表保持疏松的状态,从而牺牲掉10%~25%的空间。这对于海量数据存储而言,是一种巨大的空间浪费。该文提出一种多维Hash表结构,通过增加Hash表在逻辑上的维度,大大降低了Hash表的冲突率,实现了在较高的填充率下获得较满意的性能。实验结果表明:在千万的数据量级上,二维Hash表的冲突率比传统Hash表的减小2~4个数量级,总体性能则提升了1个数量级。该文还在原有填充率的基础上,提出失效率的概念,进一步完善和统一了Hash表性能评价指标。Traditional Hash table can quickly locate the target by calculating the Hash value of the target data to enable fast data access and retrieval. Good storage performance requires that the Hash table maintains a loose state by sacrificing 10%--25% of the space. This is a tremendous waste of space in massive data storage systems. This paper presents a multi-dimensional Hash table structure that by increasing the logical dimension of the Hash table to significantly reduce the collision rate in the Hash table for satisfactory performance with a high filling rate. Tests show that with ten million entries, the collision rate of a two dimensional Hash table is 2--4 orders of magnitude lower than a traditional Hash table and the overall performance is improved by 1 order of magnitude. In addition, a failure rate concept is proposed to improve Hash table performance evaluations.

关 键 词:多维 HASH表 海量数据存储 失效率 

分 类 号:TP311.12[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象