Web使用挖掘中的数据预处理方法被引量：2

Data preprocessing method in Web usage mining

出　　处：《郑州轻工业学院学报（自然科学版）》2010年第4期71-74,共4页Journal of Zhengzhou University of Light Industry:Natural Science

基　　金：湖南省教育厅资助科研项目(08C335);湖南科技大学教学研究与改革重点项目(G30946)

摘　　要：对Web使用挖掘的数据预处理的数据清理、用户识别、会话识别、路径补充和事务识别5个主要步骤的最新研究进展进行综述.采用拓扑结构结合引用页的路径补充算法和采用最大向前引用的事务识别算法,识别特性单一、对训练数据集的要求较高,故离实际应用还有一定的距离.针对此,从Cookie技术和启发式规则相结合、动态时间阈值法以及多特性融合等方面对数据预处理的用户识别、会话识别和事务识别提出了优化建议.Advances in major steps of data preprocessing in the field of Web usage mining,including data cleaning,user identification,session identification,path complement and transaction identification were reviewed.The path complement algorithm using topology combines reference page and the session identification algorithm using maximum forward have feature a single identification and on the training data set with higher requirements.And there is quite far distance from real application.To optimize the algorithms in data pre-processing of user identification,session identification and transaction identification,several aspects such as Cookie technology and heuristic rules,the method of dynamic time threshold and method of multi-feature fusion are proposed.

关键词：WEB挖掘网络日志数据预处理

分类号：TP392[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Web使用挖掘中的数据预处理方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Web使用挖掘中的数据预处理方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

Web使用挖掘中的数据预处理方法被引量：2