一种基于Top-K查询的加权频繁项集挖掘算法  被引量:2

A Frequent Itemset Mining Algorithm for Uncertain Data Based on Top-K Queries

在线阅读下载全文

作  者:赵学健[1] 熊肖肖 张欣慧[2] 孙知信[1] ZHAO Xue-jian;XIONG Xiao-xiao;ZHANG Xin-hui;SUN Zhi-xin(School of Modern Post,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)

机构地区:[1]南京邮电大学现代邮政学院,江苏南京210003 [2]南京邮电大学物联网学院,江苏南京210023

出  处:《计算机技术与发展》2019年第7期49-54,共6页Computer Technology and Development

基  金:国家自然科学基金(61373135,61672299);国家自然青年科学基金(61702281,20140883);江苏省基础研究计划(自然科学基金)(BK20140883,BK20140894,BK20150869)

摘  要:数据挖掘技术在各行各业的决策支持活动中扮演着越来越重要的角色,频繁项集挖掘作为数据挖掘最活跃的研究领域之一,具有广泛的应用。近年来,随着信息采集技术和数据处理技术的快速发展,针对不确定数据的频繁项集挖掘引起广泛的关注。然而,面向不确定数据集的加权频繁项集挖掘,由于项目权重值的引入使得加权频繁项集不再满足向下闭包特性,无法对频繁项集的搜索空间进行压缩,时间效率较低。因此,文中提出一种基于Top-K查询的不确定数据加权频繁项集挖掘算法(top-kfrequent itemset mining,TK-FIM),以减少候选加权频繁项集的数量,缩小加权频繁项集的搜索空间,提高搜索效率。最后,在真实数据集和合成数据集上的实验结果表明,TK-FIM算法具有良好的时间性能。Data mining plays a increasingly important role in the decision-making support activities of all walks of life.Frequent itemset mining,as one of the most active research field of data mining,has widely prospect in application.In recent years,with the rapid development of information collection technology and data processing technology,the technology of frequent itemset mining for uncertain data has attracted much attention.However,in the process of weighted frequent itemset mining for uncertain data,the introduction of weight makes the weighted frequent itemsets not satisfy the downward closure property any longer.Thus,the searching space of frequent itemsets cannot be reduced according to downward closure property which will result to a low efficiency.In this paper,the TK-FIM (top-k frequent itemset mining) is proposed to narrow the searching space of weighted frequent itemsets and improve the searching efficiency.Finally,the experiment on both synthetic and real-life datasets shows that the TK-FIM algorithm has a excellent time efficiency.

关 键 词:TOP-K 加权频繁项集 向下闭包特性 不确定数据 数据挖掘 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象