检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:韩希先[1] 宋翠 戈韵如 高宏[1] 李建中[1] HAN Xixian;SONG Cui;GE Yunru;GAO Hong;LI Jianzhong(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
机构地区:[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001
出 处:《计算机科学与探索》2019年第5期775-787,共13页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金Nos.61632010;61502121;61872106;国家重点研发计划No.2016YFB1000703;威海市大学共建项目No.ZMZ001702~~
摘 要:在许多应用中,Skyline查询是一种十分重要的查询类型,它在潜在的巨大的数据空间中返回不被其他元组支配的用户感兴趣的元组,但是Skyline查询无法控制返回结果的数量。处理一个新的top-k Skyline查询问题,该查询返回支配分数最大的k个Skyline元组,从而控制了需要向用户返回的查询结果数量。分析发现,大多数现有算法忽略了利用支配分数作为限制Skyline查询的结果数量的度量。提出一个新的基于表扫描的RSTS(ranked Skyline with table scan)算法来有效计算海量数据上的top-k Skyline结果。RSTS算法首先对表执行预排序操作,保证预排序表的元组按照对有序列表的round-robin扫描的顺序排列。RSTS算法包括两个阶段。阶段1利用对预排序表的顺序扫描来获得候选元组。阶段2计算候选元组的支配分数并返回结果。可以证明,RSTS算法具有早结束特性,并给出其扫描深度的理论分析。提出对于候选元组的剪切操作,理论剪切效果表明,绝大多数的Skyline结果可以直接丢弃。实验结果表明,RSTS算法可以有效计算top-k Skyline结果。In many applications, Skyline is an important operation to return a set of interesting tuples which are not dominated by other tuples in a potentially huge data space. The problem of Skyline is that it cannot control the size of Skyline results. This paper deals with a novel top-k Skyline query, which returns k Skyline tuples with the largest domination scores. It is found that the majority of existing algorithms ignore the application of domination scores as a metric of size limitation in Skyline query. This paper proposes a novel table-scan-based algorithm RSTS (ranked Skyline with table scan) to compute top- k Skyline query on massive data efficiently. RSTS first presorts table to arrange the tuples in the order of round- robin retrieval on sorted lists. The execution of RSTS consists of two phases. In phase 1, RSTS scans the presorted table sequentially and maintains the candidate tuples. In phase 2, RSTS calculates the domination scores of the candidates and returns query results. It is proven that RSTS has the characteristic of early termination, along with the theoretical analysis of scan depths. The pruning rule for candidate tuples is devised in this paper. The theoretical pruning effect shows that majority of the Skyline results can be discarded directly. The extensive experimental results show that RSTS can compute top-k Skyline results efficiently.
关 键 词:海量数据 top-kSkyline RSTS算法 表扫描 剪切操作
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3