基于渐近取样的频繁项集挖掘近似算法  被引量:2

Research of Frequent Items Mining Approximate Algorithm Based on Progressive Sampling

在线阅读下载全文

作  者:阚宝朋[1] 崔利[2] 

机构地区:[1]淮安信息职业技术学院计算机与通信工程学院,江苏淮安223003 [2]河南牧业经济学院信息与电子工程学院,郑州450044

出  处:《控制工程》2017年第9期1786-1791,共6页Control Engineering of China

摘  要:为提高频繁项集挖掘性能,提出了基于渐近取样的频繁项集挖掘近似算法(Frequent Itemsets Mining Approximate Algorithm based on Progressive Sampling,FIMAA-PS),该算法使用渐近取样方法实现数据集的样本提取,基于当前样本输出结果自动配置下一轮循环挖掘的样本大小,并使用Rademacher均值对输出结果的频率偏差上限进行理论估计从而得到终止条件,最后通过单次样本快速扫描判断算法终止条件,输出挖掘结果。实验结果表明,不同于传统挖掘精确算法和使用静态取样的挖掘近似算法,FIMAA-PS在输出结果精准度和运行时间方面具有显著优势。In order to improve the mining performance of frequent item sets, a frequent item set mining approximate algorithm based on progressive sampling (FIMAA-PS) is proposed. In FIMAA-PS process, it employs progressive sampling to extract the sample from the dataset, and then automatically configures the mining sample size during next iteration according to the current output, and then uses Rademacher average to compute the bound to frequency bias of output results to obtain the stopping condition. Finally, FIMAA-PS judges the stopping condition by single fast scanning of samples to output the mining results. The experimental result demonstrates that, different from the traditional mining exact algorithm and mining approximate algorithm based on static sampling, FIMAA-PS has a significant advantage in terms of accuracy and running time.

关 键 词:频繁项挖掘 近似算法 渐近取样 Rademacher均值 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象