基于Hadoop的数据仓库构建模式研究  被引量:7

Research on Construction Pattern of Hadoop Data Warehouse

在线阅读下载全文

作  者:王缓缓[1] 郭敬义 张警灿 余肖生[1] 

机构地区:[1]三峡大学计算机与信息学院,湖北宜昌443002

出  处:《重庆理工大学学报(自然科学)》2015年第7期69-73,共5页Journal of Chongqing University of Technology:Natural Science

基  金:湖北省教育厅自然科学研究项目(Q20141212)

摘  要:针对目前基于Hadoop的数据仓库一般采用"一对一"的模式或方法构建的情况,首先通过实例分析其不足之处;然后借鉴软件工程中的"生成器"设计模式的思想,提出一种Hadoop数据仓库的构建模式,称为"元数据驱动的生成器模式",用于构建基于Hadoop的数据仓库,即ETL过程。该模式具有两点优势:一是由元数据驱动,充分发挥了关系数据库管理系统对元数据操作的效率优势;二是识别了"通用知识"和"具体对象知识"两类知识,并在对知识的分类基础上设计和实现ETL过程,消除了"一对一"模式下大量不必要的重复操作。The "case to case" pattern is a commonly used method for constructing Hadoop Hive data warehouse recently. Firstly, the "case to case" pattern was introduced and its disadvantage was shown with an example. Then inspired by the "Builder Pattern" which is one of design patterns in the area of software engineering, a pattern called "metadata-driven builder pattern" was put forward for constructing Hadoop Hive data warehouse and ETL process. This pattern has two advantages. One is that it is driven by the metadata and the metadata is operated by the relational database management (RDBMS). Doing so can achieve higher efficiency because the metadata of Hive is just stored in the RDBMS. The other one is that the "general knowledge" and "specific-object knowledge" are differentiated and the ETL process is designed and realized based on such differentiation. Doing so can avoid lots of repetitions that the "case to case" pattern leads to.

关 键 词:云计算 大数据 数据仓库 HADOOP ETL 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象