一种基于容错粗糙集的Web搜索结果聚类方法  被引量:5

A Web Search Result Clustering Based on Tolerance Rough Set

在线阅读下载全文

作  者:易高翔[1] 胡和平[1] 

机构地区:[1]华中科技大学计算机科学与技术学院,武汉430074

出  处:《计算机研究与发展》2006年第2期275-280,共6页Journal of Computer Research and Development

摘  要:一些Web聚类方法把类严格作为互斥的关系,聚类效果不理想·一种基于容错粗糙集的k均值的聚类解决了这一问题·首先运用向量模型表示Web文档信息,采用常规方法得到文本特征词集,然后利用某些特征词协同出现的价值,构造特征词容错关系,扩充特征词的描述能力,最后用特征词容错类描述文档之间的相似关系,实现了Web搜索结果聚类,并提出了简单直观的衡量聚类精度的T模型·实验结果表明,利用容错关系聚类的类标记描述性强、容易理解、明显优于普通k均值算法·Most of Web clustering algorithms considered classes of mutually exclusive concepts, few took the fact of overlap concept between clusters into account, so the cluster result is not very good. In fact, a single page usually falls into several categories. That is to say, there exit indiscernible relation between clusters. Rough sets theory was first presented by Pawlak professor in 1982, which was a prefect tool that denoted indiscernible relation between sets. A k-mean algorithm for Web search results clustering based on tolerance rough set is proposed. Firstly, Web document are denoted by vector space model with terms. Then the value of term co-occurrence is utilized for the description of tolerance class of term, which extends the capability of term to document. Finally, a Web search result clustering algorithm is implemented, in which the similarity between documents is described by the term tolerance class, and a simple and intuitionistic T criterion for estimating cluster precision is also presented. The proposed solution is evaluated in search results returned from actual Web search engines and compared with other recent methods. Finally, apprehensible class labels and a good improvement are gained by using tolerance classes in Web result clustering.

关 键 词:WEB文档 聚类 粗糙集 容错粗糙集 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象