大数据3.0——后Hadoop时代大数据的核心技术  被引量:6

Big Data 3.0-The Key Technologies of Big Data in Post-Hadoop Era

在线阅读下载全文

作  者:刘汪根 孙元浩 Liu Wanggen;Sun Yuanhao(Transwarp Technology(Shanghai)co,Ltd,Shanghai 200233,China)

机构地区:[1]星环信息科技(上海)有限公司,上海200233

出  处:《数据与计算发展前沿》2019年第1期94-104,共11页Frontiers of Data & Computing

摘  要:【目的】以Hadoop为代表的第一代大数据技术架构存在过于复杂、性能不足,以及与云计算不能很好结合等问题,因此星环科技重新设计了大数据技术栈。【方法】设计了资源调度层来管理各种生命周期的服务和任务;抽象出了统一存储管理层,通过插拔不同的存储引擎来实现对不同类型数据的需求;通过统一的基于DAG的计算引擎来支持多种计算负载;在开发层提供标准的SQL和Python接口。【结果】使用Kubemetes技术统一管理数据服务和容器技术实现更好的多租户能力,打通大数据和业务之间的衔接,从而更好的实现数据业务化和业务数据化,也在大规模商用中得到了验证。【结论】通过对大数据架构的重新设计,不仅有效的解决了第一代大数据实现的技术问题,而且更好的与云计算和新型硬件技术结合,可以代表新一代大数据基础技术栈的发展方向。[Objective]Since cloud computing and new hardware technology quickly adopted by industry,more and more users complain about the architect of Hadoop because of its property of high complexity,not mature nor stable,and not flexible for cloud computing.Transwarp redesigned the big data software stack in order to make users be able to use big data technology better and easier.[Methods]The new stack includes a new Resource Management and Scheduling Layer,which is able to manage tasks within different kinds of life cycle;a new Storage Management Layer which is able to add or remove different storage plugins for different data types and acts as a new distributed storage;a unified DAG-based computing engine which can be used for data warehouse,stream computing,graph computing,etc.A development interface supporting SQL and Python is designed for developers to reduce the coding complexity.[Results]Big data technology finally can work well with cloud computing by using Kubemetes for resource management.Besides,applications can work well with big data system software using these technologies on one unified platform.[Conclusions]By refining big data system stack,we didn't only solve the technical issues related to Hadoop,but also make big data system software works well with cloud computing and new hardware,which specifies the research direction of big data technology in the future.

关 键 词:大数据 云计算 DAG 实时计算 Kubemetes 多租户 统一存储管理 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象