检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:卢鑫[1] 陈华辉[1] 董一鸿[1] 钱江波[1]
机构地区:[1]宁波大学信息科学与工程学院,宁波315211
出 处:《模式识别与人工智能》2013年第7期695-704,共10页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金项目(No.60973047);浙江省自然科学基金项目(No.Y1091189);浙江省公益性技术应用研究计划项目(No.2011C21076);宁波市自然科学基金项目(No.2009A610072);宁波大学胡岚博士基金项目(No.2011277)资助
摘 要:Top-k查询是不确定性数据管理中普遍采用的一种技术.基于参数化排名函数的Top-k查询语义是近年来提出的各种查询语义的统一.文中针对海量不确定数据,提出一种基于MapReduce框架的Top-k计算的有效方法.通过分析基于参数化排名函数的不确定数据Top-k查询语义,设计一种获得未计算元组的排名函数值上界的算法,避免计算所有元组的排名函数值,解决Top-k计算中的剪枝问题.在MapReduce计算模型中提出两种不同的策略来实现该算法.文中针对单机环境和Hadoop分布式计算平台进行两组不同的对比实验.实验表明在处理海量不确定数据时,该算法在计算时间上有较高的性能提升.Top-k query is commonly used in the management and application on uncertain data. And the Top-k query semantics base on parameterized ranking functions (PRF) is the unified approach of various query semantics proposed in recent years. Aiming at the massive uncertain dataset, an effective method for the Top-k query based on MapReduce is proposed. Through the analysis on the Top-k query semantics of parameterized ranking functions, an algorithm is presented to get the upper bound of an un-retrieved tuple. In this way, the pruning strategy is used to get the Top-k tuples without retrieving every tuple in the dataset. Furthermore, two different strategies are presented to implement the proposed algorithm under the MapReduce computing model in Hadoop. Finally, two groups of, experiments are performed aiming at a single-machine environment and the Hadoop distributed computing platform. The experimental results show that the proposed algorithm is more effective to deal with the Top-k queries for the massive uncertain data on running time.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28