模型驱动的大数据流水线框架PiFlow  被引量:8

PiFlow:model driven big data pipeline framework

在线阅读下载全文

作  者:朱小杰[1] 赵子豪 杜一[1,2] ZHU Xiaojie;ZHAO Zihao;DU Yi(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院计算机网络信息中心,北京100190 [2]中国科学院大学,北京100049

出  处:《计算机应用》2020年第6期1638-1647,共10页journal of Computer Applications

基  金:国家重点研发计划云计算与大数据重点专项(2018YFB1004001);国家自然科学基金重点项目(61836013);中国烟草总公司科技重大专项(110201801019(SJ-01))。

摘  要:复杂流程的大数据处理多依托于流水线系统,但大数据处理的流水线系统在易用性、功能复用性、扩展性以及处理性能等方面存在不足。针对上述问题,为提高大数据处理环境的构建与开发效率,优化处理流程,提出了一种模型驱动的大数据流水线框架PiFlow。首先,将大数据处理过程抽象为有向无环图;然后,开发一系列组件用于构建数据处理流水线,并设计了流水线任务执行机制。同时,为规范和简化流水线框架的描述,设计了基于模型驱动的大数据流水线描述语言--PiFlowDL,该语言以模块化、层次化的方式对大数据处理任务进行描述。PiFlow以所见即所得(WYSIWYG)的方式配置流水线,集成了状态监控、模板配置、组件集成等功能,与Apache NiFi相比有2~7倍的性能提升。Big data processing with complex process mostly relies on pipeline systems. However,the pipeline systems of big data processing have some shortcomings in usability,function reusability,expansibility and processing performance. In order to solve the problems and improve the construction and development efficiency of big data processing environment and optimize the processing flow,a model driven big data pipeline framework called PiFlow was proposed. Firstly,the big data processing process was abstracted as a directed acyclic graph. Then,a series of components were developed to construct the data processing pipeline,and the pipeline task execution mechanism was designed. At the same time,in order to standardize and simplify the pipeline framework description,a model driven big data pipeline description language called PiFlowDL was designed,which described the big data processing tasks in a modular and hierarchical way. PiFlow configures the pipeline in a What You See Is What You Get(WYSIWYG)way,and integrates the functions such as status monitoring,template configuration,and component integration. Compared with Apache NiFi,it has the performance improvement of 2-7 times.

关 键 词:大数据 流水线 流水线调度 模型驱动的开发方法 数据处理 

分 类 号:TP311.56[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象