MapReduce框架下的不确定数据Top-k查询计算  被引量:7

Top-k Query Calculations on Uncertain Dataset under MapReduce Framework

在线阅读下载全文

作  者:卢鑫[1] 陈华辉[1] 董一鸿[1] 钱江波[1] 

机构地区:[1]宁波大学信息科学与工程学院,宁波315211

出  处:《模式识别与人工智能》2013年第7期695-704,共10页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.60973047);浙江省自然科学基金项目(No.Y1091189);浙江省公益性技术应用研究计划项目(No.2011C21076);宁波市自然科学基金项目(No.2009A610072);宁波大学胡岚博士基金项目(No.2011277)资助

摘  要:Top-k查询是不确定性数据管理中普遍采用的一种技术.基于参数化排名函数的Top-k查询语义是近年来提出的各种查询语义的统一.文中针对海量不确定数据,提出一种基于MapReduce框架的Top-k计算的有效方法.通过分析基于参数化排名函数的不确定数据Top-k查询语义,设计一种获得未计算元组的排名函数值上界的算法,避免计算所有元组的排名函数值,解决Top-k计算中的剪枝问题.在MapReduce计算模型中提出两种不同的策略来实现该算法.文中针对单机环境和Hadoop分布式计算平台进行两组不同的对比实验.实验表明在处理海量不确定数据时,该算法在计算时间上有较高的性能提升.Top-k query is commonly used in the management and application on uncertain data. And the Top-k query semantics base on parameterized ranking functions (PRF) is the unified approach of various query semantics proposed in recent years. Aiming at the massive uncertain dataset, an effective method for the Top-k query based on MapReduce is proposed. Through the analysis on the Top-k query semantics of parameterized ranking functions, an algorithm is presented to get the upper bound of an un-retrieved tuple. In this way, the pruning strategy is used to get the Top-k tuples without retrieving every tuple in the dataset. Furthermore, two different strategies are presented to implement the proposed algorithm under the MapReduce computing model in Hadoop. Finally, two groups of, experiments are performed aiming at a single-machine environment and the Hadoop distributed computing platform. The experimental results show that the proposed algorithm is more effective to deal with the Top-k queries for the massive uncertain data on running time.

关 键 词:不确定数据 Top—k查询 MAPREDUCE 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象