检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Chahrazed B.Bachir Belmehdi Abderrahmane Khiat Nabil Keskes
机构地区:[1]LabRI-SBA,Enterprise Information Systems,ESI-SBA Institute,Fraunhofer IAIS,Algeria,Germany
出 处:《Data Intelligence》2024年第2期504-530,共27页数据智能(英文)
基 金:the financial support of Fraunhofer Cluster of Excellence (CCIT)
摘 要:The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations.Data virtualization systems,including Ontology-Based Data Access(ODBA)query data on-the-fly against the original data sources without any prior data materialization.Existing approaches by design use a fixed model e.g.,TABULAR as the only Virtual Data Model-a uniform schema built on-the-fly to load,transform,and join relevant data.While other data models,such as GRAPH or DOCUMENT,are more flexible and,thus,can be more suitable for some common types of queries,such as join or nested queries.Those queries are hard to predict because they depend on many criteria,such as query plan,data model,data size,and operations.To address the problem of selecting the optimal virtual data model for queries on large datasets,we present a new approach that(1)builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and(2)calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries.OPTIMA-implementation of our approach currently leverages state-of-the-art Big Data technologies,Apache-Spark and Graphx,and implements two virtual data models,GRAPH and TABULAR,and supports out-of-the-box five data sources models:property graph,document-based,e.g.,wide-columnar,relational,and tabular,stored in Neo4j,MongoDB,Cassandra,MySQL,and CSV respectively.Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831,thus,a reduction in query execution time of over 40%for the tabular model selection and over 30%for the graph model selection.
关 键 词:Data Virtualization Big Data OBDA Deep Learning
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.12.148.147