Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data

作　　者：Chahrazed B.Bachir Belmehdi Abderrahmane Khiat Nabil Keskes

机构地区：[1]LabRI-SBA,Enterprise Information Systems,ESI-SBA Institute,Fraunhofer IAIS,Algeria,Germany

出　　处：《Data Intelligence》2024年第2期504-530,共27页数据智能（英文）

基　　金：the financial support of Fraunhofer Cluster of Excellence (CCIT)

摘　　要：The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations.Data virtualization systems,including Ontology-Based Data Access(ODBA)query data on-the-fly against the original data sources without any prior data materialization.Existing approaches by design use a fixed model e.g.,TABULAR as the only Virtual Data Model-a uniform schema built on-the-fly to load,transform,and join relevant data.While other data models,such as GRAPH or DOCUMENT,are more flexible and,thus,can be more suitable for some common types of queries,such as join or nested queries.Those queries are hard to predict because they depend on many criteria,such as query plan,data model,data size,and operations.To address the problem of selecting the optimal virtual data model for queries on large datasets,we present a new approach that(1)builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and(2)calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries.OPTIMA-implementation of our approach currently leverages state-of-the-art Big Data technologies,Apache-Spark and Graphx,and implements two virtual data models,GRAPH and TABULAR,and supports out-of-the-box five data sources models:property graph,document-based,e.g.,wide-columnar,relational,and tabular,stored in Neo4j,MongoDB,Cassandra,MySQL,and CSV respectively.Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831,thus,a reduction in query execution time of over 40%for the tabular model selection and over 30%for the graph model selection.

关键词：Data Virtualization Big Data OBDA Deep Learning

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索