基于TreeMiner算法的XML文档结构相似度量方法被引量：2

Method of similarity measures for XML documents structure based on TreeMiner algorithm

作　　者：阎红灿[1,2] 王淑芬[3] 朱晓亮[3] 李敏强[1] 刘保相[2]

机构地区：[1]天津大学管理学院,天津300072 [2]河北理工大学理学院,河北唐山063009 [3]河北理工大学计算中心,河北唐山063009

出　　处：《计算机应用研究》2009年第5期1706-1709,1722,共5页Application Research of Computers

基　　金：河北省自然科学基金资助项目(F2006000377);高等学校博士学科点专项科研基金资助项目(20020056047)

摘　　要：提出了一种基于TreeMiner算法挖掘频繁子树的文档结构相似度量方法,解决了传统的距离编辑法计算代价高而路径匹配法无法处理重复标签的问题。该方法架构了一个新的检索模型—频繁结构向量模型,给出了文档的结构向量表示和权重函数,构造了XML文档结构相似度量计算公式;同时从数据结构和挖掘程序上对TreeMiner算法进行了改进,使其更适合大文档数据集的结构挖掘。实验结果表明,该方法具有很高的计算精度和准确率。This paper proposed a novel way of similarity measures for XML documents structure based on TreeMiner algorithm, and resolved the high costs in distance editing and the problems of repetiition of labels in path matching designed. In this way, a new research model ： frequent structure vector model （ FSVM）, derived the expression of document structure vector and weight function, and constructed the calculate formula to measure similarity of the two documents. In order to improve the efficiency of mining frequency subtrees in a forest, reformed the algorithm TreeMiner from data structure and miner procedure to fit to minning structure in large documents. The testing results show that this method acquires very high precision and veracity.

关键词：频繁结构向量模型嵌入子树频繁子树结构挖掘

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于TreeMiner算法的XML文档结构相似度量方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于TreeMiner算法的XML文档结构相似度量方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于TreeMiner算法的XML文档结构相似度量方法被引量：2