数据流中闭频繁项集的并行挖掘算法  被引量:1

Parallel Mining Algorithm of Closed Frequent Itemsets in the Data Stream

在线阅读下载全文

作  者:冯忠慧 尹绍宏[1] FENG Zhonghui;YIN Shaohong(School of Computer Science and Software Engineering,Tianjin Polytechnic University,Tianjin 300387,China)

机构地区:[1]天津工业大学软件工程系

出  处:《软件工程》2018年第8期10-14,共5页Software Engineering

摘  要:闭频繁项集包含了关于频繁项集的完整信息,可显著减少频繁项集挖掘所产生的模式数量,在一定程度上降低了内存开销、提高了时间效率。数据流的特性决定了它需要更高效的挖掘算法,为此使用分治策略,提出一种并行化闭频繁项集挖掘算法PCFI。该算法采用垂直数据格式存储项集的事务,通过对事务集的集合运算,可快速得到项集的支持度计数,合并具有相同事务集的频繁项,得到初始生成子,降低了搜索空间的规模。采用分治策略对初始生成子进行并行处理,得到约简前序集和约简后序集,在挖掘过程中不断地对每一生成子的搜索空间进行减枝,得到更小的约简后序集,从而减少对冗余数据的处理。实验分析表明,该算法的性能优于先前设计的算法。The closed frequent itemsets contain complete information about frequent itemsets,which can significantly reduce the number of patterns generated by frequent itemsets mining,to a certain extent,decreasing the memory overhead and improving the time efficiency.The characteristics of the data stream determine that it needs a more efficient mining algorithm.To solve this problem,the paper proposes a parallel closed frequent itemsets mining algorithm,PCFI.This algorithm uses the vertical data format to store the items in a set.By collecting the set of transactions,the support counts of the items can be quickly obtained,and the frequent items with the same set of transactions are merged to obtain the initial generation and reduce the size of the search space.The partitioning strategy is adopted to process the initial generator in parallel,and the sets of pre-reduction sequence and the post-reduction sequence are obtained.In the mining process,the search space of each generator is continuously reduced,and the reduction sequence set becomes smaller,thus reducing the redundant data processing.Experimental analysis shows that the performance of this algorithm is superior to the previously designed algorithm.

关 键 词:数据流 滑动窗口 垂直数据格式 并行计算 闭频繁项集 

分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象