有向标记根树之间的语义编辑距离  

Semantic Edit Distance between Two Directed Labeled and Rooted Trees

在线阅读下载全文

作  者:康琪[1] 马军[1] 

机构地区:[1]山东大学计算机科学与技术学院,济南250101

出  处:《模式识别与人工智能》2011年第6期816-824,共9页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.60970047);中国博士后科学基金项目(No.20100471503);山东省自然科学基金项目(No.Y2008G19);山东省科技攻关项目(No.2007GG10001002;2008GG10001026)资助

摘  要:有向标记根树之间的编辑距离(TED)被广泛应用在文档的结构化相似度计算上.文中提出有向标记根树之间的语义编辑距离(TSED)的概念,并给出计算公式.组合TED和TSED形成距离测度,并应用在XML文档的结构聚类上.实验表明该距离模型在结构化聚类的准确率和召回率上明显优于单纯利用TED算法的聚类结果.该算法在时间复杂性上也等同于利用动态规划计算TED的最好算法.In graph theory, the tree edit distance (TED) between two directed labeled and rooted trees is a popular research issue. As a combination optimization problem, calculating TED is widely used in the detection of the structural similarity of semi-structural documents. In this paper, a concept named tree semantic edit distance (TSED) with the corresponding formula is proposed. Then a distance measure based on both TED and TSED is presented. The proposed distance is applied in clustering the document object model (DOM) trees of extensible markup language (XML) documents. Experimental results show the proposed measure is better than those used TED only in terms of clustering precision and recall. The time complexity of the proposed algorithm is the same as those of algorithms for TED based on dynamic programming.

关 键 词:树编辑距离 文档聚类 结构相似度 语义相似性 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象