一种面向数据流的频繁项集挖掘算法

An Algorithm for Mining Frequent Itemsets in Data Streams

作　　者：孟彩霞[1]

出　　处：《昆明理工大学学报（理工版）》2009年第5期26-30,35,共6页Journal of Kunming University of Science and Technology(Natural Science Edition)

基　　金：国家自然科学基金(项目编号:60573096);陕西省自然科学基金项目(项目编号:2004f283);西安市科技创新支撑-应用发展研究计划项目(项目编号:YF07024)

摘　　要：与传统静态数据库中的数据不同,数据流是一个按时间到达的有序的项集,这使得经典的频繁项集挖掘算法难以适用到数据流中.根据数据流的特点,提出了数据流频繁项集挖掘算法FP-SegCount.该算法将数据流分段并利用改进的FP-growth算法挖掘分段中的频繁项集.然后,利用Count Min Sketch进行项集计数.算法解决了压缩统计和计算快速高效的问题.通过和FP-DS算法的实验对比,FP-SegCount算法具有较好的时间效率.Different from data in traditional static database, a data stream is an ordered sequence of items that arrives in timely order. Classical frequent item - sets mining method is difficult to apply to data stream. Based on the characteristics of data streams, FP - SegCount algorithm is proposed in this paper to mine frequent item - sets from data streams. The algorithm partitions the data stream and uses modified FP - growth algorithm to mine frequent item- sets in every segment. It then counts item -sets in Count Min Sketch. This algorithm solves compressed statistics and ensures effective computation. Through experimentation and comparison with FP - DS algorithm, FP SegCount algorithm is shown to have a good time efficiency.

关键词：数据流数据挖掘数据流挖掘频繁项集

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向数据流的频繁项集挖掘算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向数据流的频繁项集挖掘算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索