检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]哈尔滨工业大学计算机学院,哈尔滨150080
出 处:《计算机研究与发展》2010年第5期866-877,共12页Journal of Computer Research and Development
基 金:国家自然科学基金项目(60473075);黑龙江省自然科学基金项目(zjg03-05)
摘 要:XML已成为信息交换和表示的标准.对XML数据的查询将返回满足特定约束的XML节点子集.对于大文件的XML数据的查询处理通常分为两步:1.为该XML数据建立一个索引;2.在索引上完成查询处理无需访问源文档.XML索引为查询处理提供了高效的帮助,其中F&B索引是已知的处理分枝查询最小的索引,但快速创建F&B索引和利用F&B索引完成查询处理的算法却很少有人研究.提出了一种素数序列标记法,这种标记法不仅有助于快速地建立F&B索引,更可以高效地完成F&B索引上的查询处理.此外,还给出了F&B索引上的区间标记法与CCPI的创建过程,这两种编码创建过程无需在建立F&B索引后二次创建,仅需与F&B索引创建过程一起对文档使用SAX解析器分析一次即可得到.这样,可以在F&B索引的区间标记法上使用TwigStack算法执行查询处理,在F&B索引的CCPI标记法上使用关联路径连接算法执行查询处理.还给出了基于素数序列标记法的查询处理算法,即素数整除匹配算法,该算法可以高效地判定某节点是否有某分枝子结构.实验表明基于素数序列标记法的F&B索引创建方法比SAM算法快,在多个数据集F&B索引上素数整除匹配算法优于关联路径连接算法和TwigStack算法.XML is widely used as a standard of information exchange and representation. Queries on XML can retrieve a subset of XML data nodes satisfying certain constraints. Queries on large XML data are usually processed in two steps: 1. An index of XML nodes is created; 2. Queries are processed on that index without accessing XML data. XML index provides high efficiency for XML query processing. Particularly, F&B-index is the smallest index that supports twig query processing. However, few researches are proposed on how to efficiently create F&B-Index and how to process queries based on F&B-Index. Proposed in this paper is a new labeling scheme called prime sequence. This labeling scheme helps not only on creating an F&B-Index but also on efficient query processing. With prime sequence labeling, an F&B-index can be created by parsing the XML document only once with a SAX parser. Further, region encoding and CCPI on F&B-Index can be created during the creation of F&B-Index. Thus, TwigStack algorithm and related-path-join algorithm can be used to process queries on created F&B-Index. Also proposed is an efficient algorithm named division match ,over F&B-Index. The algorithm can efficiently judge relationship between two nodes based on a property of prime sequence labeling. Experiments show that prime sequence labeling provides high efficiency on creating F&B-Index and high efficiency on query processing on different datasets.
关 键 词:XML 索引 F&B索引 素数序列标记法 CCPI TwigStack
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.217.200.151