检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华中科技大学计算机科学与技术学院,武汉430074
出 处:《计算机研究与发展》2006年第2期275-280,共6页Journal of Computer Research and Development
摘 要:一些Web聚类方法把类严格作为互斥的关系,聚类效果不理想·一种基于容错粗糙集的k均值的聚类解决了这一问题·首先运用向量模型表示Web文档信息,采用常规方法得到文本特征词集,然后利用某些特征词协同出现的价值,构造特征词容错关系,扩充特征词的描述能力,最后用特征词容错类描述文档之间的相似关系,实现了Web搜索结果聚类,并提出了简单直观的衡量聚类精度的T模型·实验结果表明,利用容错关系聚类的类标记描述性强、容易理解、明显优于普通k均值算法·Most of Web clustering algorithms considered classes of mutually exclusive concepts, few took the fact of overlap concept between clusters into account, so the cluster result is not very good. In fact, a single page usually falls into several categories. That is to say, there exit indiscernible relation between clusters. Rough sets theory was first presented by Pawlak professor in 1982, which was a prefect tool that denoted indiscernible relation between sets. A k-mean algorithm for Web search results clustering based on tolerance rough set is proposed. Firstly, Web document are denoted by vector space model with terms. Then the value of term co-occurrence is utilized for the description of tolerance class of term, which extends the capability of term to document. Finally, a Web search result clustering algorithm is implemented, in which the similarity between documents is described by the term tolerance class, and a simple and intuitionistic T criterion for estimating cluster precision is also presented. The proposed solution is evaluated in search results returned from actual Web search engines and compared with other recent methods. Finally, apprehensible class labels and a good improvement are gained by using tolerance classes in Web result clustering.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15