检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京师范大学计算机科学与技术学院,江苏南京210097
出 处:《南京师范大学学报(工程技术版)》2009年第2期61-64,共4页Journal of Nanjing Normal University(Engineering and Technology Edition)
基 金:国家自然科学基金(40771163)资助项目
摘 要:提出了一种GML文档结构聚类新算法MCF-CLU.与其它相关算法不同,该算法基于闭合频繁Induced子树进行聚类,聚类过程中不需树之间的两两相似度比较,而是挖掘GML文档数据库的闭合频繁Induced子树,为每个文档求一个闭合频繁Induced子树作为该文档的代表树,将具有相同代表树的文档聚为一类.聚类过程中自动生成簇的个数,为每个簇形成聚类描述,而且能够发现孤立点.实验结果表明算法MCF-CLU是有效的,且性能优于其它同类算法.This paper presents an algorithm MCF _ CLU for clustering GML documents by structure. Different from other algorithms, it goes on clustering based on the closed frequent induced subtrees, and doesnt need comparing the similarity between trees. The closed frequent induced subtrees of all the GML documents are computed. The representative closed frequent induced subtree of every document is obtained. The documents which have the same representative tree are regarded as a cluster. During the clustering process, not only the number of clusters can be obtained automatically, but the description of the clusters can be achieved. By the way, the isolated points of the documents can be found. The experimental results show that MCF _ CLU is effective, and that its performance is superior to those of other GML clustering algorithms.
关 键 词:闭合频繁Induced子树 GML结构聚类 聚类
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90