一种基于交叉熵的top-k频繁项集挖掘算法

A Top-k Frequent Itemset Mining Algorithm Based on Cross Entropy

作　　者：宋威[1] 郑川龙 SONG Wei;ZHENG Chuanlong(School of Information Science and Technology, North China University of Technology,Beijing 100144, China)

机构地区：[1]北方工业大学信息学院,北京100144

出　　处：《郑州大学学报（理学版）》2022年第4期27-33,共7页Journal of Zhengzhou University:Natural Science Edition

基　　金：国家自然科学基金项目(61977001);北京市长城学者培养计划项目(CIT&TCD20190305)。

摘　　要：通过指定期望结果项集数量挖掘top-k频繁项集,可解决频繁项集挖掘中支持度阈值难以设定的问题。由于能在较短的时间内得到足够多的精确结果,因此利用启发式方法挖掘项集的工作受到了越来越多的关注,但利用启发式方法来挖掘top-k频繁项集却鲜有研究。提出了一种基于交叉熵的top-k频繁项集挖掘算法KCE。首先,给出了将交叉熵应用于top-k频繁项集挖掘的建模方法;其次,提出了基于过滤支持度的搜索空间剪枝策略;第三,设计了利用按位交叉来产生下一代项集的策略,以提高样本的多样性。实验结果表明,KCE算法在运行时间和空间消耗上都有优势,且挖掘结果的平均精度在95%以上。With the number of expected itemsets,the difficulty of setting support threshold of frequent itemset mining could be tackled by top-k frequent itemsets mining.Because heuristic methods could get enough accurate results in a short time,it attracted more and more attention on mining itemsets using heuristic methods.However,few studies on using heuristic methods to discover top-k frequent itemsets were conducted.KCE,a top-k frequent itemset mining algorithm based on cross entropy was proposed.How to use cross entropy for top-k frequent itemset mining was modeled at first.Then,pruning strategy based on filtering support was proposed.Furthermore,sample generation strategy based on bitwise crossover was designed.With this strategy,diversity of sample was improved.The advantages of running time and memory consumption were shown in the experimental results.Furthermore,it was also verified that the average accuracy of the mining results was above 95%.

关键词：数据挖掘 top-k频繁项集交叉熵过滤支持度按位交叉

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于交叉熵的top-k频繁项集挖掘算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于交叉熵的top-k频繁项集挖掘算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索