A mediation system for continuous spatial queries on a unified schema using Apache Spark  

在线阅读下载全文

作  者:Thi Thu Trang Ngo François Pinet David Sarramia Myoung-Ah Kang 

机构地区:[1]UniversitéClermont Auvergne,ISIMA,Aubière,France [2]UniversitéClermont Auvergne,INRAE,UR TSCF,Clermont-Ferrand,France [3]UniversitéClermont Auvergne,CNRS/IN2P3,LPC,Clermont-Ferrand,France

出  处:《Big Earth Data》2024年第1期115-141,共27页地球大数据(英文)

基  金:financed by the French government IDEX-ISITE initiative 16-IDEX-0001(CAP 20-25);the PhD is funded by the European Regional Development Fund(FEDER).

摘  要:Recent advances in big and streaming data systems have enabled real-time analysis of data generated by Internet of Things(IoT)systems and sensors in various domains.In this context,many applications require integrating data from several heterogeneous sources,either stream or static sources.Frameworks such as Apache Spark are able to integrate and process large datasets from different sources.However,these frameworks are hard to use when the data sources are heterogeneous and numerous.To address this issue,we propose a system based on mediation techniques for integrating stream and static data sources.The integration process of our system consists of three main steps:configuration,query expression and query execution.In the configuration step,an administrator designs a mediated schema and defines mapping between the mediated schema and local data sources.In the query expression step,users express queries using customized SQL grammar on the mediated schema.Finally,our system rewrites the query into an optimized Spark application and submits the application to a Spark cluster.The results are continuously returned to users.Our experiments show that our optimizations can improve query execution time by up to one order of magnitude,making complex streaming and spatial data analysis more accessible.

关 键 词:Streaming data streaming data integration MEDIATOR geospatial data continuous queries 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象