用带权重的pq-gram算法计算XML文档相似度被引量：1

Calculating Similarity of XML Documents by Weighted Pq-gram Algorithm

机构地区：[1]南京航空航天大学计算机科学与技术学院,江苏南京210016

出　　处：《计算机与现代化》2015年第3期20-25,共6页Computer and Modernization

基　　金：国家自然科学基金资助项目(61202350)

摘　　要：XML文档聚类是高效管理XML文档的重要手段,XML文档相似度计算正是其中的关键步骤。pq-gram算法是解决XML文档相似度计算问题的有效手段,但忽略了XML文档结点的有序性。带权重的pq-gram算法是在此基础上,依据XML文档的结构性,首先为结点赋予相应权重,然后基于结点的权重对pq-gram赋予权重,最后将设定的权重应用到XML文档相似度计算中。实验结果表明,带权重的pq-gram算法更好地描述结点在XML文档相似度计算中的贡献度,提高了XML文档相似度计算的精度。Clustering for XML documents is an important method for efficiently managing XML documents,and calculating similarity of XML documents is the pivotal step. Pq-gram algorithm is an efficient method to solve the problem of calculating similarity of XML documents. However,it ignores that the nodes of XML documents are ordered. Based on the pq-gram algorithm,weighted pq-gram algorithm,in accordance with the structural characteristics of XML documents,sets weight for nodes,and sets weight for pq-grams based on the weight of nodes,then applies the weight to the method of calculating similarity of XML documents. Experimental results show that the weighted pq-gram algorithm describes the contribution of nodes better in the process of calculating similarity of XML documents,and improves the precision of calculating of XML documents.

关键词：XML文档计算相似度 pq-gram 权重

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

用带权重的pq-gram算法计算XML文档相似度被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

用带权重的pq-gram算法计算XML文档相似度 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

用带权重的pq-gram算法计算XML文档相似度被引量：1