Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data  

在线阅读下载全文

作  者:Chahrazed B.Bachir Belmehdi Abderrahmane Khiat Nabil Keskes 

机构地区:[1]LabRI-SBA,Enterprise Information Systems,ESI-SBA Institute,Fraunhofer IAIS,Algeria,Germany

出  处:《Data Intelligence》2024年第2期504-530,共27页数据智能(英文)

基  金:the financial support of Fraunhofer Cluster of Excellence (CCIT)

摘  要:The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations.Data virtualization systems,including Ontology-Based Data Access(ODBA)query data on-the-fly against the original data sources without any prior data materialization.Existing approaches by design use a fixed model e.g.,TABULAR as the only Virtual Data Model-a uniform schema built on-the-fly to load,transform,and join relevant data.While other data models,such as GRAPH or DOCUMENT,are more flexible and,thus,can be more suitable for some common types of queries,such as join or nested queries.Those queries are hard to predict because they depend on many criteria,such as query plan,data model,data size,and operations.To address the problem of selecting the optimal virtual data model for queries on large datasets,we present a new approach that(1)builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and(2)calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries.OPTIMA-implementation of our approach currently leverages state-of-the-art Big Data technologies,Apache-Spark and Graphx,and implements two virtual data models,GRAPH and TABULAR,and supports out-of-the-box five data sources models:property graph,document-based,e.g.,wide-columnar,relational,and tabular,stored in Neo4j,MongoDB,Cassandra,MySQL,and CSV respectively.Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831,thus,a reduction in query execution time of over 40%for the tabular model selection and over 30%for the graph model selection.

关 键 词:Data Virtualization Big Data OBDA Deep Learning 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象