一种挖掘不确定数据最大模式的深度优先算法被引量：3

A Depth-first Algorithm for Mining Uncertain Data Maximal Model

机构地区：[1]上海交通大学信息安全工程学院,上海200240 [2]上海市公安局网络安全保卫大队,上海200000 [3]北京北信源软件股份有限公司,北京100081

出　　处：《计算机工程》2015年第7期204-209,共6页Computer Engineering

基　　金：国家科技部科技支撑计划基金资助项目(2011BAK13B05);教育部新世纪优秀人才计划基金资助项目(NCET-12-0358);上海市科委科研创新基金资助重点项目(12ZZ019);上海市科技计划基金资助项目(13JG0500400)

摘　　要：不确定性数据挖掘是数据挖掘领域的研究热点,但其应用于最大频繁项集的算法较少。根据不确定数据挖掘的特点,把挖掘确定性数据最大频繁模式的Gen Max算法扩展到不确定数据中,提出一种U-Gen Max算法。对Tid集进行扩展,在id域的基础上增加概率域,实现垂直数据格式转换。在频繁项集判断方面加入前置判断来剪枝非频繁项集,相比直接计算置信度的方式,降低了计算量。基于栈式结构给出多步回退剪枝新策略,从而避免Gen M ax算法只能单步回退的缺陷。实验结果证明,该算法计算性能良好,可适用于各种情况下的稀疏数据集与支持度较高情况下的稠密数据集。The research on uncertain data mining becomes a hotspot in the area of data mining recently. However, there are few algorithms which can be used to mine maximal frequent itemsets. Based on features of uncertain data, this paper proposes a new U-GenMax algorithm which improves and extends the maximal pattern mining algorithm GenMax from deterministic data to uncertain data. The algorithm extends the Tid set and adds probabilistic domain to the id domain, and realizes format converting of vertical data. In the aspect of judging frequent itemsets, the algorithm adds two prior judgments to prune infrequent itemsets, and lowers the amount of calculation enormously compared with calculating confidence level directly. The algorithm proposes a new multistep rollback pruning strategy, thus avoids the flaw of GenMax which only rolls back one step at a time. Experimental results show that the performance of U-GenMax is very good and suitable for sparse database under all circumstances as well as dense database under high degree of support.

关键词：不确定数据频繁项集最大模式垂直格式剪枝策略置信度

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种挖掘不确定数据最大模式的深度优先算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种挖掘不确定数据最大模式的深度优先算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种挖掘不确定数据最大模式的深度优先算法被引量：3