一种支持通配符查询的XML模式匹配算法  被引量:2

An Efficient XML Pattern Matching Algorithm for Supporting Wildcard Query

在线阅读下载全文

作  者:陈冲[1] 蒋夏军[1] 

机构地区:[1]南京航空航天大学计算机科学与技术学院,江苏南京211106

出  处:《计算机与现代化》2016年第4期65-73,共9页Computer and Modernization

基  金:江苏省自然科学基金资助项目(BK20140826)

摘  要:XML查询语言当中,包含通配符*的查询能够方便有效地满足一些特殊查询要求,但在大数据时代下XML文件容量与结构复杂性不断增加,现有支持通配符查询的算法需消耗巨量内存来解析XML,并且在对嵌套通配符处理时需要大量的单路径匹配操作和局部结果的缓存。针对此现状,结合现有经典算法,提出一种新的、能够高效解决小枝模式当中含有通配符*的查询算法—WTwig List。该算法首先对查询模式进行通配符的层次关系处理,减少不必要的通配符匹配,以数据流形式解析XML文件并执行局部的扩展Dewey编码,经过滤操作后得到有序的叶子节点编码列表,在列表中执行匹配操作得到结果;其次在真实和合成数据集上做大量实验,结果表明WTwig List算法与现有算法相比,能够有效提高查询效率,在空间效率上具有一定优势,且能够快速准确地处理查询模式中P-C关系。In XML query language,the wildcard query which includes "* "can effectively meet some special query requirements. But in the big data era,with the increasing of the XML file size and structural complexity,the existing algorithms which support wildcard query need huge amounts of memory to parse XML file and also need many single path matching operations and local result caching. Aiming at this situation,we propose a new XML pattern matching algorithm named WTwig List to solve the twig pattern containing the wildcard effectively. First,the hierarchical relationship of wildcard in the query pattern is processed to reduce unnecessary wildcard matching. Then the XML file is parsed as data stream pattern and the local Extended Dewey encoding is executed. After filtering operation,the ordered list of leaf node encoding is gotten,and the matching results can get from the list matching operations. A set of experimental result on both real-life and synthetic dataset demonstrates that WTwig List improves query efficiency andis of advantages in space efficiency,and it can deal with the P-C relationship quickly and accurately.

关 键 词:通配符查询 流数据处理 扩展Dewey编码 XML模式匹配 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象