基于分布式无共享架构的海量数据并行查询平台  被引量:9

Massive Data Parallel Query Platform Based on Distributed Shared-nothing Architecture

在线阅读下载全文

作  者:秦东明 喻剑[1] 张波[2] 赵勤[1,2] QIN Dong-ming;YU Jian;ZHANG Bo;ZHAO Qin(Key Laboratory of Embedded System and Service Computing of Ministry of Education,Tongji University,Shanghai 200092,China;College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 200234,China)

机构地区:[1]同济大学嵌入式系统与服务计算教育部重点实验室,上海200092 [2]上海师范大学信息与机电工程学院,上海200234

出  处:《计算机科学》2019年第4期44-49,共6页Computer Science

基  金:国家重点研发计划高性能计算专项(2016YFB0200300);国家自然科学基金(61572326;61702333);同济大学嵌入式系统与服务计算教育部重点实验室开放课题(ESSCKF 2016-01);上海市科委地方院校能力建设项目(17070502800)资助

摘  要:针对海量数据查询所面对的数据加载和并行查询控制等难题,提出了一种基于分布式无共享架构的海量数据并行查询平台。该平台利用分布式无共享架构为海量数据查询提供结构化与非结构化数据的统一处理,实现平台内数据的聚合计算。平台的核心技术如下:首先提供了多类型数据的跨平台存储与统一数据加载;然后给出了基于负载均衡的多节点数据查询任务流分配技术,生成全局查询执行策略;最后采用Hash和Range两种方式实现查询任务流的并发控制。根据测试验证,本技术在查询时间上相比于无并行方式节约了近40%。实验结果表明,该技术在海量数据查询的正确性、可靠性、并发性上具有较好的性能。In view of the challenges of data loading and parallel query controlling in massive data query systems,this paper proposed a massive parallel data query platform based on distributed shared-nothing architecture.The platform uses the distributed shared-nothing architecture to support the unified processing of structured and unstructured data in massive data query,and then achieves the aggregated calculation of data in the platform.The key technologies of the proposed platform are as follows.Firstly,the platform provides cross-platform storage and unified data loading of multiple types of data.Then,a multiple-node data query task flow distribution technology is proposed based on load balancing,and a global query execution strategy is generated.Finally,the platform uses Hash and Range methods to achieve parallel controlling for the query task flow.According to the performance verification of the proposed platform,the query time consumption of this platform is saved by 40% compared with nonparallel method.The experimental results show that this platform has good performance in the accuracy,reliability and concurrency of massive data query.

关 键 词:海量数据查询 无共享结构 并发查询 数据加载 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象