基于句子级最大频繁单词集的Web文档聚类研究被引量：1

Research on Web Document Clustering Based on Sentential Maximum Frequent Word Sets

出　　处：《计算机科学》2007年第7期154-157,164,共5页Computer Science

摘　　要：Web文档聚类是Web挖掘的一个重要研究方向。现有的挖掘算法得到的频繁模式不仅维数高,而且不能很好反映文档表达的语义信息。为了得到更精确的聚类结果,本文提出一种基于句子级的最大频繁单词集挖掘方法来挖掘文档特征项。在此基础上,先初步聚类后依据类间距离和类内链接强度阈值合并或拆分类,最终实现文档聚类。在此过程中,使用可变精度粗糙集模型计算每个类的特征向量。实验结果表明,本文提出的算法优于传统的文档聚类算法。Web document clustering is an important research direction in Web mining area. Frequent pattern acquired form existing mining algorithms not only hashigh dimension, but can＇t reflects semantic information expressed form document well. For gaining more precise clustering result, this paper presents a mining algorithm based on sentential maximum frequent words set to mine document characteristic items. Based on then, documents are clustered elementarily at first. Then classes are incorporated or separated according to distance between classes and join intension in class. At the end, documents clustering is achieved. Variable precision rough set model is used to compute eigenvector of each class. The experiment results indicate the algorithm presented in this paper is better than traditional document clustering algorithms.

关键词：WEB文档聚类粗糙集关联规则最大频繁单词集

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于句子级最大频繁单词集的Web文档聚类研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于句子级最大频繁单词集的Web文档聚类研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于句子级最大频繁单词集的Web文档聚类研究被引量：1