基于超链接信息的Web文本聚类方法研究

Research on the Method of Clustering Web Documents Based on Hyperlink Information

作　　者：孙莉娜[1]

出　　处：《电脑知识与技术》2006年第9期99-101,共3页Computer Knowledge and Technology

摘　　要：面对当前大量的文本数据信息,如何帮助人们准确定位所需信息,成为文本挖掘领域的一个研究趋势。通过将文本分类和聚类方法应用于信息检索-—对网页文本进行聚类,提出了基于超链接信息的Web文本自动聚类模型。利用结构挖掘技术获得主题领域的多个权威网页作为初始聚类中心,通过去除超链接信息中的噪声和多余链接得到网站的简明拓扑结构,并结合内容挖掘,动态调整聚类中心,最终将网页聚成各主题下的不同子类别。Facing the massive volume text data information, how to locate the required information is one of the important research directions of text mining. The algorithms of text classification and clustering are applied to information retrieval, so the method of clustering Web documents based on hyperlink is presented according to the especial feature, And then the topological structure of website are found through hyperhnk information, those noise and surplus hyperlink are cut down, the clusters are carried out based on the similarity between characteristic vectors which get from the content excavate of hyperlink anchor texts and web page texts. At the same time, the cluster centurions are adjusted dynamically, so as to realize the Web documents clustering based on hyperlink.

关键词：文本挖掘 HITS算法拓扑结构

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于超链接信息的Web文本聚类方法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于超链接信息的Web文本聚类方法研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索