Realising Data-Centric Scientific Workflows with Provenance-Capturing on Data Lakes  

在线阅读下载全文

作  者:Hendrik Noltet Philipp Wieder 

机构地区:[1]Gesellschaft fur wissenschaftliche Datenverarbeitung mbH Gottingen Gottingen,Gottingen 37077,Germany

出  处:《Data Intelligence》2022年第2期426-438,共13页数据智能(英文)

基  金:funding by the"Niedersachsisches Vorab"funding line of the Volkswagen Foundation.

摘  要:Since their introduction by James Dixon in 2010,data lakes get more and more attention,driven by the promise of high reusability of the stored data due to the schema-on-read semantics.Building on this idea,several additional requirements were discussed in literature to improve the general usability of the concept,like a central metadata catalog including all provenance information,an overarching data governance,or the integration with(high-performance)processing capabilities.Although the necessity for a logical and a physical organisation of data lakes in order to meet those requirements is widely recognized,no concrete guidelines are yet provided.The most common architecture implementing this conceptual organisation is the zone architecture,where data is assigned to a certain zone depending on the degree of processing.This paper discusses how FAIR Digital Objects can be used in a novel approach to organize a data lake based on data types instead of zones,how they can be used to abstract the physical implementation,and how they empower generic and portable processing capabilities based on a provenance-based approach.

关 键 词:Data lake PROVENANCE WORKFLOWS FAIRDigital Objects CWFR 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象