一种基于容错粗糙集的Web搜索结果聚类方法被引量：5

A Web Search Result Clustering Based on Tolerance Rough Set

出　　处：《计算机研究与发展》2006年第2期275-280,共6页Journal of Computer Research and Development

摘　　要：一些Web聚类方法把类严格作为互斥的关系,聚类效果不理想·一种基于容错粗糙集的k均值的聚类解决了这一问题·首先运用向量模型表示Web文档信息,采用常规方法得到文本特征词集,然后利用某些特征词协同出现的价值,构造特征词容错关系,扩充特征词的描述能力,最后用特征词容错类描述文档之间的相似关系,实现了Web搜索结果聚类,并提出了简单直观的衡量聚类精度的T模型·实验结果表明,利用容错关系聚类的类标记描述性强、容易理解、明显优于普通k均值算法·Most of Web clustering algorithms considered classes of mutually exclusive concepts, few took the fact of overlap concept between clusters into account, so the cluster result is not very good. In fact, a single page usually falls into several categories. That is to say, there exit indiscernible relation between clusters. Rough sets theory was first presented by Pawlak professor in 1982, which was a prefect tool that denoted indiscernible relation between sets. A k-mean algorithm for Web search results clustering based on tolerance rough set is proposed. Firstly, Web document are denoted by vector space model with terms. Then the value of term co-occurrence is utilized for the description of tolerance class of term, which extends the capability of term to document. Finally, a Web search result clustering algorithm is implemented, in which the similarity between documents is described by the term tolerance class, and a simple and intuitionistic T criterion for estimating cluster precision is also presented. The proposed solution is evaluated in search results returned from actual Web search engines and compared with other recent methods. Finally, apprehensible class labels and a good improvement are gained by using tolerance classes in Web result clustering.

关键词：WEB文档聚类粗糙集容错粗糙集

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于容错粗糙集的Web搜索结果聚类方法被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于容错粗糙集的Web搜索结果聚类方法 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于容错粗糙集的Web搜索结果聚类方法被引量：5