检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨良怀[1] 卢晨曦 范玉雷[1] 朱镇洋 潘建 YANG Liang-Huai;LU Chen-Xi;FAN Yu-Lei;ZHU Zhen-Yang;PAN Jian(School of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China;Zhijiang College,Zhejiang University of Technology,Shaoxing 312030,China)
机构地区:[1]浙江工业大学计算机学院,浙江杭州310023 [2]浙江工业大学之江学院,浙江绍兴312030
出 处:《软件学报》2021年第11期3576-3595,共20页Journal of Software
基 金:国家重点研发计划(2020YFB1707700)。
摘 要:大数据流的高效存储与索引是当今数据领域的一大难点.面向带有时间属性的数据流,根据其时间属性,将数据流划分为连续的时间窗口,提出了基于双层B+树的分布式索引结构WB-Index.下层B+树索引基于窗口内流数据构建,索引构建过程结合基于排序的批量构建技术,进一步对时间窗口分片,将数据流接收、分片数据排序以及B+树构建并行化,提高了构建性能.上层B+树索引基于各时间窗口构建,结合时间窗口时间戳的递增性和无限性,提出了避免节点分裂的构建方法,减少了B+树分裂移动开销,提高了空间利用率和更新效率.WB-Index架构中,将流数据和索引分离,同时利用内存缓存尽可能多的双层B+索引和热点数据来提高查询性能.理论和实验结果表明,该分布式索引架构能够支持高效的实时数据流写入以及流数据查询,能够很好地应用于具有时间属性的数据流场景.Efficient storage and indexing of big data streams are challenging issues in the database field.By segmenting the temporal data stream into continuous time windows,a distributed master-slave index structure is proposed based on double-layer B+tree called WB-Index.Lower B+tree index is built on stream tuples in each time window.Upper B+tree index is built on each successive time window.Lower B+tree index is constructed by combining both batch loading and parallel sorting techniques.The core idea of the construction method is to slice the time window and isolate the parallelable operations from others in the time window.Sorting and data stream receiving between slices work in parallel,while the B+tree skeleton(a B+tree without value)construction for the time window and the merge-sorting operation are parallelized as well.These techniques effectively expedite the B+tree construction.Due to the monotonous increasement of timestamps of time windows,a split-less method for upper B+tree index construction is adopted to avoid the node splitting and memory movement overhead,and improve the space utilization and update efficiency.In WB-Index,data stream tuples and index are separated,and index and hotspot data are cached as much as possible to improve query efficiency.Finally,theoretic analysis and experiments have both demonstrated that WB-Index can support efficient real-time data stream writing and stream data querying.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.234.89