检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山东大学计算机科学与技术学院,济南250101
出 处:《模式识别与人工智能》2011年第6期816-824,共9页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金项目(No.60970047);中国博士后科学基金项目(No.20100471503);山东省自然科学基金项目(No.Y2008G19);山东省科技攻关项目(No.2007GG10001002;2008GG10001026)资助
摘 要:有向标记根树之间的编辑距离(TED)被广泛应用在文档的结构化相似度计算上.文中提出有向标记根树之间的语义编辑距离(TSED)的概念,并给出计算公式.组合TED和TSED形成距离测度,并应用在XML文档的结构聚类上.实验表明该距离模型在结构化聚类的准确率和召回率上明显优于单纯利用TED算法的聚类结果.该算法在时间复杂性上也等同于利用动态规划计算TED的最好算法.In graph theory, the tree edit distance (TED) between two directed labeled and rooted trees is a popular research issue. As a combination optimization problem, calculating TED is widely used in the detection of the structural similarity of semi-structural documents. In this paper, a concept named tree semantic edit distance (TSED) with the corresponding formula is proposed. Then a distance measure based on both TED and TSED is presented. The proposed distance is applied in clustering the document object model (DOM) trees of extensible markup language (XML) documents. Experimental results show the proposed measure is better than those used TED only in terms of clustering precision and recall. The time complexity of the proposed algorithm is the same as those of algorithms for TED based on dynamic programming.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.136.26.17