高速流环境下近似连续k代表轮廓查询算法  被引量:1

Approximate Continuous k Representative Skyline Query Algorithm over High-Speed Streaming Data Environment

在线阅读下载全文

作  者:朱睿 宋栿尧 王斌 杨晓春 张安珍 夏秀峰 ZHU Rui;SONG Fu-Yao;WANG Bin;YANG Xiao-Chun;ZHANG An-Zhen;XIA Xiu-Feng(School of Computer Science,Shenyang Aerospace University,Shenyang 110136,China;School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China)

机构地区:[1]沈阳航空航天大学计算机学院,辽宁沈阳110136 [2]东北大学计算机科学与工程学院,辽宁沈阳110169

出  处:《软件学报》2023年第3期1425-1450,共26页Journal of Software

基  金:国家自然科学基金(62102271,62072088,61991404);国家重点研发计划(2020YFB1707901);沈阳市创新人才项目(RC200439)。

摘  要:k代表轮廓查询是从传统轮廓查询中衍生出来的一类查询.给定多维数据集合D,轮廓查询从D中找到所有不被其他对象支配的对象,将其返回给用户,便于用户结合自身偏好选择高质量对象.然而,轮廓对象规模通常较大,用户需要从大量数据中进行选择,导致选择速度和质量无法得到保证.与传统轮廓查询相比,k代表轮廓查询从所有轮廓对象中选择“代表性”最强的k个对象返回给用户,有效地解决了传统轮廓查询存在的这一问题.给定滑动窗口W和连续查询q,q监听窗口中的数据.当窗口滑动时,查询q返回窗口中,组合支配面积最大的k个对象.现有算法的核心思想是:实时监测当前窗口中的轮廓对象集合,当轮廓对象集合更新时,算法更新k代表轮廓.然而,实时监测窗口中,轮廓集合的计算代价通常较大.此外,当轮廓集合规模较大时,从中选择k代表轮廓的计算代价是同样巨大的,导致已有算法无法在高速流环境下使用.针对上述问题,提出了ρ-近似k代表轮廓查询.为了支持该查询,提出了查询处理框架PAKRS(predict-based approximate k representative skyline).首先,PAKRS利用高速流的特性对当前窗口进行划分,根据划分结果构建未来窗口预测结果集,用其预测新流入窗口数据成为轮廓对象的最早时间.其次,提出了索引ρ-GRID.它帮助PAKRS在2维和d维(d>2)环境下,分别以O(k/s+k/m)和O(2Ld/m+2Ld/s)的增量维护代价下筛选近似k代表轮廓,L是一个小于k的正整数.由理论分析证明可知,PAKRS的计算复杂度小于前人所提的算法计算复杂度.最后,通过大量实验对所提算法性能进行评估.结果表明,PAKRS的运行时间是PBA(prefix-based algorithm)算法的1/4、GA(greedy algorithm)算法的1/6、ε-GA(ε-constraint greedy algorithm)算法的1/3.k representative skyline query is a type of query derived from traditional skyline query. Given a set of d-dimensional dataset D,a skyline query finds all objects in D that are not dominated by other ones, which helps users to select high-quality objects based on their preference. However, the scale of skyline objects may be large in many cases, users have to choose target objects from a larg e number of objects, leading that both the selection speed and quality cannot be guaranteed. Compared with traditional skyline query, k representative skyline query chooses the most “representative” k objects from all skyline objects, which effectively solves such problem causes by traditional skyline query. Given the sliding window W and a continuous query q, q monitors objects in the window. When the window slides, q returns k skyline objects with the largest group dominance size in the window. The key behind existing algorithms is to monitor skyline objects in the current window. When the skyline set is updated, the algorithm updates k representative skyline set. However, the cost of monitoring skyline set is usually high. When the skyline set scale is large, the computational cost of choosing k representative skyline objects is also high. Thus, existing algorithms cannot efficiently work under high-speed stream environment. This study proposes a query named ρ-approximate k representative skyline query. In order to support this type of queries, a novel framework is proposed named PAKRS(predict-based approximate k representative skyline). Firstly, PAKRS partitions the current window into a group of sub-windows. Next, the predicted result sets of a few future windows are constructed according to the partition result. In this w ay, the earliest moment can be predicted when new arrived objects may become skyline objects. Secondly, an index is proposed named ρ-GRID,which can help PAKRS to select ρ-approximate k representative skyline with O(k/s+k/m) computational cost under 2-dimensional space,and O(2Ld/m+2Ld/s) computa

关 键 词:轮廓查询 k代表轮廓查询 滑动窗口 分片 高速流 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象