基于Web日志挖掘的Web文档聚类被引量：5

Web document clustering based on web-log mining

机构地区：[1]辽宁石油化工大学计算机与通信工程学院,辽宁抚顺113001

出　　处：《计算机工程与设计》2008年第18期4708-4710,共3页Computer Engineering and Design

摘　　要：Web日志挖掘是Web挖掘的一种,介绍了Web日志挖掘的一般过程,研究了k-means聚类算法,并分析了k-means聚类算法的不足。k-means聚类算法迭代过程中每次都需要计算每个数据对象到簇质心的距离,使得聚类效率不高,针对这个问题,提出了k-means聚类算法的改进算法,该算法避免了重复计算数据对象到簇质心的距离,并用这两种算法实现了Web文档的聚类。试验结果表明,该改进算法提高了聚类效率。Web log mining is one of the web mining. The process of the web log mining and the k-means algorithms are introduced. And the shortage of the k-means algorithm is analyzed. The k-means algorithm needs to compute the distance between every data object and the center of the clusters, which lowers the efficiency. To this problem, an enhanced algorithm of the k-means is put forward, which avoids computing the distance between every data object and the center of the clusters. Web document clustering is implemented with two algorithms and it is shown that the enhanced algorithm improves the clustering efficiency.

关键词：日志挖掘 WEB日志 K-MEANS 文档聚类日志预处理

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Web日志挖掘的Web文档聚类被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Web日志挖掘的Web文档聚类 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Web日志挖掘的Web文档聚类被引量：5