HaoLap:基于Hadoop的海量数据OLAP系统  被引量:5

HaoLap:An Hadoop Based OLAP System for Massive Data

在线阅读下载全文

作  者:郭朝鹏 王智[1] 韩峰[1] 张一川[1] 宋杰[1] 

机构地区:[1]东北大学软件学院,沈阳110819

出  处:《计算机研究与发展》2013年第S1期378-383,共6页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61202088);中央高校基本科研业务费专项资金项目(100704001);辽宁省自然科学基金项目(200102059)

摘  要:近年来,随着计算机技术的发展及其在互联网、传感器和科学数据分析等领域的广泛应用,数据量爆炸性地增长,海量数据给传统的数据管理和分析带来新的挑战,学界和业界广泛采用分布式文件系统和MapReduce编程模型来应对这一挑战.介绍了HaoLap(Hadoop based OLAP),一种基于Hadoop分布式文件系统(HDFS)和MapReduce编程模型的海量数据OLAP系统.本研究吸取了MOLAP的经验:采用元数据存储多维模型以及HDFS存储事实数据,采用编码完成维和事实数据的映射,采用MapReduce完成OLAP运算.描述了HaoLap的关键技术,包括系统结构、维定义和编码、事实数据存储和编码、OLAP算法和服务接口.介绍了HaoLap在科学数据分析的应用案例,并与主流非关系数据管理系统进行性能对比.实验结果表明,尽管数据装载性能略显不足,但HaoLap的OLAP性能要优于HBase,Hive,HadoopDB等主流非关系数据管理系统.In recent years,with the development of computer technology and its widespread usage in fields like the Internet,sensors and scientific data analysis,the data amount has explosively grown. To address the new challenges the massive data has brought for traditional data management and analysis,distributed file systems and the MapReduce programming model have been wildly adopted in both industry and academia.Based on the same technologies,we proposed HaoLap(Hadoop based OLAP),an OLAP system for massive data.Drawing on the experience of MOLAP,HaoLap adopts metadata to store the multidimensional model,HDFS to store the fact data,coding method to achieve the mapping between the dimensions and the measures,and MapReduce to perform OLAP.This paper firstly illustrates the key techniques of HaoLap,including system architecture,dimension definition and coding method,fact storage and coding method,OLAP algorithm and service interface;then describes the application case of HaoLap on scientific data analysis and compares its performance with other dominate non-relational data management systems.Experiments show the HaoLap's huge dominance in OLAP performance against the acceptable performance losing in data loading.

关 键 词:多维数据模型 OLAP 海量数据 HDFS MAPREDUCE 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象