在线挖掘数据流闭频繁项集的高效算法  被引量:2

Efficient Algorithm for Online Mining Closed Frequent Itemsets over Data Streams

在线阅读下载全文

作  者:毛伊敏[1] 陈志刚[2] 

机构地区:[1]江西理工大学应科院,赣州341000 [2]中南大学信息学院,长沙410083

出  处:《计算机科学》2013年第2期229-234,共6页Computer Science

基  金:国家自然基金项目(51164012基金项目号);江西教育厅科技项目(GJJ12347号)资助

摘  要:数据流闭频繁项集挖掘算法得到了广泛的研究,其中一个典型的工作就是NewMoment算法。针对New-Moment算法存在搜索空间大而造成算法时间效率低的问题,提出了一种改进的数据流闭频繁项集挖掘算法A-New-Moment。它设计了一个二进制位表示项目与扩展的频繁项目列表相结合的数据结构,来记录数据流信息及闭频繁项集。在窗体初始阶段,首先挖掘频繁1-项集所产生的支持度为最大的最长闭频繁项集,接着提出新的"不需扩展策略"和"向下扩展策略"来避免生成大量中间结果,快速发现其余闭频繁项集,达到极大缩小搜索空间的目的。在窗体滑动阶段,提出"动态不频繁剪枝策略"来从已生成的闭频繁项集中快速删除非闭频繁项集,并提出"动态不搜索策略"来动态维护所有闭频繁项集的生成,以降低闭频繁项集的维护代价,提高算法的效率。理论分析与实验结果表明,A-New-Moment算法具有较好的性能。Mining closed frequent itemsets from data streams has been extensively studied, in which NewMoment is re- garded as a typical algorithm. However, there is the problem of its big search space causing bad performance in time in NewMoment. This paper presented an algorithm called A-NewMoment which ameliorates NewMoment to mine closed frequent itemsets. Firstly, it designed a combinative data structure which uses an effective bit-victor to represent items and an extended frequent item list to record the current closed frequent information in streams. Secondly, the new pru- ning strategies called WSS and CSS were proposed to avoid a large number of intermediate results generated, so the search space is reduced greatly. Finally, the pruning strategy called DNFIPS was also proposed to delete no closed fre- quent itemsets from HTC. At the same time, it also designed a novel strategy called DNSS to efficiently and dynamically maintain these operations that all closed frequent itemsedts are added and deleted. Theoretical analysis and experimental results show that the proposed method is efficient.

关 键 词:数据挖掘 数据流 频繁项集 闭频繁项集 

分 类 号:TP182[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象