检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东北大学软件学院,沈阳110819 [2]东北大学信息科学与工程学院,沈阳110819
出 处:《计算机应用》2016年第1期8-12,共5页journal of Computer Applications
基 金:国家自然科学基金重大项目(61433008);国家自然科学青年基金资助项目(61202088);中国博士后科学基金面上项目(2013M540232);中央高校基本科研业务费专项(N120817001);教育部博士点基金资助项目(20120042110028)~~
摘 要:针对大数据环境下完整性查询时间代价消耗过高的问题,提出了一种采用近似完整性查询方法的系统——Probery。Probery所采用的近似完整性查询方法不同于传统的近似查询,其近似性主要体现为数据查全的可能性,是一种新型的数据查询方法。Probery首先将存入系统的数据划分为多个数据分段;然后,根据概率放置模型将各个数据分段的数据存储在分布式文件系统中;最后,对于给定的查询条件,Probery采用一种启发式查询方法进行概率查询。通过与其他主流的非关系型数据管理系统的查询性能进行比较,对Probery进行验证,Probery在损失8%查询完整性的情形下,查询时间较HBase相比节约了51%,较Cassandra相比节约了23%,较Mongo DB相比节约了12%,较Hive相比节约了3%。实验结果表明,Probery可以适当地损失查询完整性来提高数据的查询性能,具有较好的通用性、适应性和可扩展性。Since the time consumption of full-result query for big data is excessively high, the system Probery was proposed. Different from traditional approximate query, Probery adopted an approximate full-result query method, an original method to query data. The approximation of Probery referred to the probability of containing all data satisfying query conditions in query results. Firstly, Probery divided the data stored in system into multiple data segments. Secondly, Probery placed the data in Distributed File System( DFS) according to the probability placing model. Finally, given a query condition, Probery adopted a heuristic query method to query data probably. The performance of query data was shown by comparing with other dominated non-relational data management system, in the case that the completeness of result set lost by 8%. The query time consumption of Probery was saved by 51% compared with HBase, by 23% compared with Cassandra, by 12% compared with Mongo DB, by 3% compared with Hive. The experimental results show that Probery improves the performance of query data when the completeness of query data losses appropriately. In addition, Probery has better generality, adaptability and extensibility for big data query.
关 键 词:大数据 概率查询 查全概率 分布式文件系统 MAPREDUCE
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49