基于模式增长的不确定数据的频繁模式挖掘算法被引量：7

Frequent pattern mining algorithm from uncertain data based on pattern-growth

出　　处：《计算机应用》2015年第7期1921-1926,共6页journal of Computer Applications

基　　金：国家自然科学基金资助项目(61370200);宁波市自然科学基金资助项目(2013A610115;2014A610073);浙江省教育厅一般科研项目(Y201432717);宁波大红鹰学院大宗商品专项课题(1320133004)

摘　　要：为提高不确定数据频繁模式（FP）挖掘算法的时空效率,提出了基于最大概率的不确定频繁模式挖掘（UFPM-MP）算法。首先,利用事务项集中的最大概率值预估期望支持数;然后,使用该期望支持数与最小期望支持数阈值进行比较,以确定某一项集是否为候选频繁项集,并对候选项集建立子树以递归挖掘频繁模式。实验中,UFPMMP算法与AT-Mine算法进行了对比,并在6个典型的数据集上进行实验验证。实验结果表明,UFPM-MP算法的时空效率得到了提高,稀疏数据集上提高约30%,稠密数据集上的效率提高更为明显（约3~4倍）。预估期望支持数的策略有效地减少了子树和头表项的数量,从而提高了算法的时空效率;且最小期望支持数越小,或需要挖掘的频繁模式越多的时候,算法的时间效率提高越多。To improve the time and space efficiency of Frequent Pattern （FP） mining algorithm over uncertain dataset, the Uncertain Frequent Pattern Mining based on Max Probability （UFPM-MP） algorithm was proposed. First, the expected support number was estimated using maximum probability of the transaction itemset. Second, by comparing this expected support number to the minimum expected support number threshold, the candidate frequent itemsets were identified. Finally, the corresponding sub-trees were built for recursively mining frequent patterns. The UFPM-MP algorithm was tested on 6 classical datasets against the state-of-the-art algorithm AT （Array based tail node Tree structure）-Mine with positive results （ about 30% improvement for sparse datasets, and 3 - 4 times more efficient for dense datasets）. The expected support number estimation strategy effectively reduces the number of sub-trees and items of header table, and improves the algorithm＇s time and space efficiency; and when the minimum expected support threshold is low or there are lots of potential frequent patterns, time efficiency of the proposed algorithm performs more remarkably.

关键词：不确定数据频繁模式频繁项集模式增长

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于模式增长的不确定数据的频繁模式挖掘算法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于模式增长的不确定数据的频繁模式挖掘算法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于模式增长的不确定数据的频繁模式挖掘算法被引量：7