检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:宋威[1] 郑川龙 SONG Wei;ZHENG Chuanlong(School of Information Science and Technology, North China University of Technology,Beijing 100144, China)
出 处:《郑州大学学报(理学版)》2022年第4期27-33,共7页Journal of Zhengzhou University:Natural Science Edition
基 金:国家自然科学基金项目(61977001);北京市长城学者培养计划项目(CIT&TCD20190305)。
摘 要:通过指定期望结果项集数量挖掘top-k频繁项集,可解决频繁项集挖掘中支持度阈值难以设定的问题。由于能在较短的时间内得到足够多的精确结果,因此利用启发式方法挖掘项集的工作受到了越来越多的关注,但利用启发式方法来挖掘top-k频繁项集却鲜有研究。提出了一种基于交叉熵的top-k频繁项集挖掘算法KCE。首先,给出了将交叉熵应用于top-k频繁项集挖掘的建模方法;其次,提出了基于过滤支持度的搜索空间剪枝策略;第三,设计了利用按位交叉来产生下一代项集的策略,以提高样本的多样性。实验结果表明,KCE算法在运行时间和空间消耗上都有优势,且挖掘结果的平均精度在95%以上。With the number of expected itemsets,the difficulty of setting support threshold of frequent itemset mining could be tackled by top-k frequent itemsets mining.Because heuristic methods could get enough accurate results in a short time,it attracted more and more attention on mining itemsets using heuristic methods.However,few studies on using heuristic methods to discover top-k frequent itemsets were conducted.KCE,a top-k frequent itemset mining algorithm based on cross entropy was proposed.How to use cross entropy for top-k frequent itemset mining was modeled at first.Then,pruning strategy based on filtering support was proposed.Furthermore,sample generation strategy based on bitwise crossover was designed.With this strategy,diversity of sample was improved.The advantages of running time and memory consumption were shown in the experimental results.Furthermore,it was also verified that the average accuracy of the mining results was above 95%.
关 键 词:数据挖掘 top-k频繁项集 交叉熵 过滤支持度 按位交叉
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222