并行的XML数据流模式匹配算法  

Parallel Matching Algorithm on XML Stream Data Pattern

在线阅读下载全文

作  者:陈冲[1] 蒋夏军[1] 张青平[1] 

机构地区:[1]南京航空航天大学计算机科学与技术学院,南京210016

出  处:《计算机科学与探索》2015年第12期1439-1449,共11页Journal of Frontiers of Computer Science and Technology

基  金:江苏省自然科学基金~~

摘  要:随着大数据时代的到来,大规模XML文件不断地涌现,其信息庞大,结构复杂,而传统的XML查询匹配技术需要大量的存储空间和预解析工作,不能有效解决XML大文件的匹配要求。针对这种现状,分析了现有经典匹配算法核心思想,并结合多线程并行相关知识,提出了一种新的并行的XML数据流模式匹配算法,称为并行路径流算法(parallel path stream,PPS)。该算法在以流模式顺序解析XML文件的过程中,缓存以查询模式根元素为根节点的子树,以顺序链表存储节点的编码信息,在进行有效过滤后加入任务链表中,采用独特的匹配方法并行操作任务池中的各个顺序链表后得到匹配结果。实验表明,该算法能够明显减少存储空间,其过滤过程和并行操作能够有效减少匹配时间,并在查询路径长度方面具有一定优势。With the arrival of the big-data era, XML files with large scale, huge information and complicated structure have emerged constantly. But the previous XML query matching algorithms can’t effectively solve the matching problem because they need lots of storage space and preliminary parse work. This paper proposes a new parallel XML stream data pattern matching algorithm named PPS(parallel path stream), which is based on the analysis of the core idea of the existing classic matching algorithm and integrates the knowledge on multi-thread parallel. In the process of sequentially parsing XML file as stream pattern, subtree, whose root is the root element of the query pattern, is stored as ordered list and is added into the task list after an effective filtration. Then the matching result can be gotten through parallel operating on the ordered list. The experimental results demonstrate that the PPS can significantly reduce the storage space, the filtration process and the parallel operation can effectively reduce the matching time, beyond that, the PPS has certain advantage in the query path length.

关 键 词:XML模式匹配 流数据处理 路径栈算法 多线程 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象