Web检索结果快速聚类方法的研究与实现被引量：2

Research and realization about fast clustering method of web search engine results

机构地区：[1]燕山大学信息工程学院河北秦皇岛 066004 [2]中国科学院软件研究所中文信息中心

出　　处：《计算机工程与设计》2004年第12期2231-2233,2290,共4页Computer Engineering and Design

摘　　要：为了帮助Web用户从搜索引擎所返回的大量文档片断中筛选出自己所需要的文档,在对聚类过程研究分析的基础上给出了一种Web检索结果快速聚类方法。它通过分析聚类过程,从建立索引模型、相似性的计算到聚类结果的形成等环节,都做了分析和简化,并利用检索结果的标题、Url以及文档片断3部分所含信息计算返回结果之间的相似度,将首先返回的部分检索结果利用无向图映射法进行部分聚类后,将其余返回结果分配到与之最相近的集簇中最终形成聚类结果。该方法实现简单。实验证明该方法响应速度快,聚类相关性较高,空间占用少。To help web users to choose the wanted documents from large number of snippets returned by search engine, a fast clustering method is provided based on snippets after researching and analyzing the clustering progress. From the building index model, calculating the similarities to clustering the snippets, they are all analysed and simplified. The similarity is calculated by the content of snippets' title, url and abstract. Snippets and their similaryties are mapped to an undirected graph, the part of the first returned snippets and their similarities form the clusters relying on the graph at first, and then the others are assigned to the closest cluster. Experiment shows that the method is easy to compliment and it has a good performance on running-speed, clustering quality and occupated-space.

关键词：聚类方法文档 WEB检索相似度搜索引擎快速集簇检索结果标题环节

分类号：TP391[自动化与计算机技术—计算机应用技术] TP311[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Web检索结果快速聚类方法的研究与实现被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Web检索结果快速聚类方法的研究与实现 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

Web检索结果快速聚类方法的研究与实现被引量：2