基于不确定性数据的频繁闭项集挖掘算法  被引量:1

Mining Algorithm of Frequent Closed Itemsets Based on Uncertain Data

在线阅读下载全文

作  者:章淑云 张守志[1] 

机构地区:[1]复旦大学计算机科学技术学院,上海200433

出  处:《计算机工程》2014年第3期51-54,共4页Computer Engineering

摘  要:对于不确定性数据,传统判断项集是否频繁的方法并不能准确表达项集的频繁性,同样对于大型数据,频繁项集显得庞大和冗余。针对上述不足,在水平挖掘算法Apriori的基础上,提出一种基于不确定性数据的频繁闭项集挖掘算法UFCIM。利用置信度概率表达项集频繁的准确性,置信度越高,项集为频繁的准确性也越高,且由于频繁闭项集是频繁项集的一种无损压缩表示,因此利用压缩形式的频繁闭项集替代庞大的频繁项集。实验结果表明,该算法能够快速地挖掘出不确定性数据中的频繁闭项集,在减少项集冗余的同时保证项集的准确性和完整性。For the uncertain data, traditional method of judging whether an itemset is frequent cannot express how close the estimate is, meanwhile frequent itemsets are large and redundant for large datasets. Regarding to the above two disadvantages, this paper proposes a mining algorithm of frequent closed itemsets based on uncertain data called UFCIM to mine frequent closed itemsets from uncertain data according to frequent itemsets mining method from uncertain data, and it is based on level mining algorithm Apriori. It uses probability of confidence to express how close the estimate is, the larger that probability of confidence is, the itemsets are more likely to be frequent. Besides as frequent closed itemsets are compact and lossless representation of frequent itemsets, so it uses compacted frequent closed itemsets to take place of frequent itemsets which are of huge size. Experimental result shows the UFCIM algorithm can mine frequent closed itemsets effectively and quickly. It can reduce redundancy and meanwhile assure the accuracy and completeness of itemsets.

关 键 词:不确定性数据 频繁闭项集 数据挖掘 水平挖掘 置信度概率 

分 类 号:TP311.12[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象