改进的多数据流协同频繁项集挖掘算法  被引量:4

Improved algorithm for mining collaborative frequent itemsets in multiple data streams

在线阅读下载全文

作  者:王鑫[1,2] 刘方爱[1,2] 

机构地区:[1]山东师范大学信息科学与工程学院,济南250014 [2]山东省分布式计算机软件新技术重点实验室(山东师范大学),济南250014

出  处:《计算机应用》2016年第7期1988-1992,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(61572301;90612003);山东省自然科学基金资助项目(ZR2013FM008)~~

摘  要:针对已有的多数据流协同频繁项集挖掘算法存在内存占用率高以及发现频繁项集效率低的问题,提出了改进的多数据流协同频繁项集挖掘(MCMD-Stream)算法。首先,该算法利用单遍扫描数据库的字节序列滑动窗口挖掘算法发现数据流中的潜在频繁项集和频繁项集;其次,构建类似频繁模式树(FP-Tree)的压缩频繁模式树(CPTree)存储已发现的潜在频繁项集和频繁项集,同时更新CP-Tree树中每个节点生成的对数倾斜时间表中的频繁项计数;最后,通过汇总分析得出在多条数据流中多次出现的且有价值的频繁项集,即协同频繁项集。相比A-Stream和HStream算法,MCMD-Stream算法不仅能够提高多数据流中协同频繁项集挖掘的效率,并且还降低了内存空间的使用率。实验结果表明MCMD-Stream算法能够有效地应用于多数据流的协同频繁项集挖掘。In view of low memory usage rate and inefficient discovery for mining frequent itemsets in multiple data streams, an improved Mining Collaborative frequent itemsets in Multiple Data Stream( MCMD-Stream) algorithm was proposed. Firstly, the window sliding based on bit-sequence technique was utilized, which was a single-pass algorithm to find the potential and frequent itemsets. Secondly, Compressed frequent Pattern Tree( CP-Tree), which is similar to Frequent Pattern Tree( FP-Tree), was constructed to store the potential and frequent itemsets. And each node in the CP-Tree could generate the logarithmic tilted window to save the counts of frequent itemsets. Finally, the valuable frequent itemsets that appeared repeatedly in multiple data streams, namely collaborative frequent itemsets, were got. Compared to A-Stream and HStream algorithms, MCMD-Stream algorithm can improve the mining efficiency of collaborative frequent itemsets in multiple data streams, and also reduce the usage of the memory space. The experimental results show that MCMD-Stream algorithm can efficiently be applied to mine the collaborative frequent itemsets in multiple data streams.

关 键 词:流数据挖掘 多数据流 滑动窗口 频繁项集 协同频繁项集 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象