基于森林自动机处理XML流数据方法  被引量:1

XML streaming data processing method based on forest transducer

在线阅读下载全文

作  者:何志学 廖湖声[2] HE Zhi-xue;LIAO Hu-sheng(Computer and Remote Sensing Information Technology Institute,North China Institute of Aerospace Engineering, Langfang 065000,China;Department of Computer Science,Beijing University of Technology,Beijing 100124,China)

机构地区:[1]北华航天工业学院计算机与遥感信息技术学院,河北廊坊065000 [2]北京工业大学计算机学院,北京100124

出  处:《计算机工程与设计》2018年第10期3092-3099,共8页Computer Engineering and Design

基  金:国家自然科学基金青年基金项目(61202074);北京市自然科学基金项目(4122011);河北省教育厅青年基金项目(QN2016248);河北省科技计划基金项目(15210126)

摘  要:针对流数据在线实时到达,顺序性一次访问及处理时效性高、缓存量小的需求,提出一种基于森林自动机处理XPath查询的方法。定义XPath查询到森林自动机实例的转换规则;采用栈结构和抽象语法树相结合的方式,不断接收流数据结点,驱动自动机的运行,完成结点匹配和状态转换动作;在抽象语法树中维护各状态函数之间的关系及中间结果,归约过程中获得查询结果随即输出。实验结果验证了该方法处理流数据的有效性,在标准测试数据集下,与同类方法和引擎相比,在处理效率上有近30%的提高,内存占接近于常量,较好解决了时空复杂度平衡问题,为其它方法提供了有益的参考。Focusing on the characteristics of processing semi-structure XML streaming data such as the stream arriving conti-nuously,requiring to be read sequentially and only once into memory,the querying must be processed on the fly,a method of processing XPath query based on forest transducer was proposed.The conversion rules of forest transducer were defined for XPath query.The transducer was driven by input streaming data nodes.Stack and abstract syntax tree were applied to implement match and state transformation in running procedure.The relationships between state functions and intermediate results were kept by abstract syntax tree,and the query results were outputted in the reduce process.Experimental results show that the proposed approach is effective and efficient on this problem,and outperforms about 30 percent over the state-of-the-art algorithms especially for deep nested processed data.At the same time,memory usage is nearly constant.This method resolves the balance between time and space complexity,and it is a useful reference for other methods.

关 键 词:流数据 森林自动机 查询处理 XPATH查询 XML数据 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象