基于滑动窗口模型的数据流闭合高效用项集挖掘被引量：15

Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model

作　　者：程浩东韩萌[1] 张妮李小娟王乐 Cheng Haodong;Han Meng;Zhang Ni;Li Xiaojuan;Wang Le(College of Computer Science and Engineering,North Minzu University,Yinchuan 750021)

机构地区：[1]北方民族大学计算机科学与工程学院,银川750021

出　　处：《计算机研究与发展》2021年第11期2500-2514,共15页Journal of Computer Research and Development

基　　金：国家自然科学基金项目(62062004);宁夏自然科学基金项目(2020AAC03216);北方民族大学研究生创新项目(YCX20077)。

摘　　要：从数据流中挖掘高效用项集是一项具有挑战性的任务,因为传入的数据必须在时间和存储内存约束下进行实时处理.数据流挖掘通常会产生大量冗余的项集,为了减少这些无用的项集数量且保证无损压缩,需要挖掘闭合项集,它可以比全集高效用项集的集合小几个数量级.为了解决以上问题,提出一种基于滑动窗口模型的数据流闭合高效用项集挖掘(closed high utility itemsets mining over data stream based on sliding window model,CHUI_DS)算法.在CHUI_DS中设计了一种新的效用列表结构,该结构在提升批次插入和删除的速度方面非常有效.此外,应用修剪策略来改进闭合项集挖掘过程,消除潜在的低效用候选对象.对真实数据集和合成数据集进行的广泛实验评估显示了该算法的效率以及可行性.就速度而言,它优于先前提出的主要以批处理模式运行的算法.且它适用于不同大小的滑动窗口,在事务数量等方面具有较强的扩展性.It is a challenging task to mine high utility itemsets from the data stream,because the incoming data stream must be processed in real time within the constraints of time and storage memory.Data stream mining usually generates a large number of redundant itemsets.In order to reduce the number of these useless itemsets and ensure lossless compression of complete high utility itemsets,it is necessary to mine closed itemsets,which can be several orders of magnitude smaller than the collection of complete high utility itemsets.In order to solve the above problem,a high utility itemsets mining algorithm(sliding-window-model-based closed high utility itemsets mining on data stream,CHUI_DS)is proposed to achieve mining closed high utility itemsets on data stream.A new utility-list structure is designed in CHUI_DS,which is very effective in increasing the speed of batch insertion and deletion.In addition,effective pruning strategies are applied to improve the closed itemset mining process and eliminate potential low-utility candidates.Extensive experimental evaluation of the proposed algorithm on real datasets and synthetic datasets shows the efficiency and feasibility of the algorithm.In terms of speed,it is superior to the previously proposed algorithms that mainly run in batch mode.Moreover,it is suitable for sliding windows of different sizes,and has strong scalability in terms of the number of transactions.

关键词：模式挖掘数据流挖掘闭合高效用项集滑动窗口效用列表

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于滑动窗口模型的数据流闭合高效用项集挖掘被引量：15

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于滑动窗口模型的数据流闭合高效用项集挖掘 被引量：15

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于滑动窗口模型的数据流闭合高效用项集挖掘被引量：15