一种大规模数据快速并行导入工具的研究与实现被引量：1

STUDY AND REALISATION OF A FAST PARALLEL IMPORT TOOL FOR VERY-LARGE DATA

出　　处：《计算机应用与软件》2015年第9期26-30,共5页Computer Applications and Software

基　　金：河南省教育厅科学技术研究重点项目(12B520025);郑州市科技攻关项目(20120473);校级科研项目(KYZR201006)

摘　　要：随着大规模数据的快速增长及高可靠性需求,将本地数据迁移到分布式数据库势在必行。针对这种情况,提出一种基于MapReduce的"快速并行导入"技术,充分利用集群的并行计算能力,直接向HBase底层存储文件HFile写入数据,既可避免上层数据导入时间的浪费,又节省资源开销。有效解决了从单机数据库向HBase分布式数据库导入数据功能低下、效率不高等问题。实验结果表明,在"快速并行导入"技术的基础上设计并实现的快速并行导入工具,支持多列族文本数据的快速导入。与传统使用API导入数据相比,速度提升一倍以上。With the rapid growth of very-large data and its high reliability requirement, it is inevitable to transplant local data to distributed database. In light of this case, the paper presents a MapReduce-based ＂fast parallel importing＂ technology. It makes full use of parallel computational capability of the cluster to write data directly to underlying storage file HFile of HBase, which can either avoid time-wasters in upper data import and save resources overhead as well, thus effectively solves the problems of low performance and inefficiency when importing data from a single database to HBase distributed database. Experimental result demonstrates that the fast parallel import tool designed and implemented based on the ＂fast parallel importing＂ technology supports the fast import of multi-column text data. Compared with traditional way using API to import data, its speed heightens more than double.

关键词：HADOOP HBASE MapReduce分布式数据库大规模数据导入

分类号：TP3[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种大规模数据快速并行导入工具的研究与实现被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种大规模数据快速并行导入工具的研究与实现 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种大规模数据快速并行导入工具的研究与实现被引量：1