检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨鸿茜 武优西 耿萌 刘靖宇 李艳[2] YANG Hongxi;WU Youxi;GENG Meng;LIU Jingyu;LI Yan(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;School of Economics and Management,Hebei University of Technology,Tianjin 300401,China)
机构地区:[1]河北工业大学人工智能与数据科学学院,天津300401 [2]河北工业大学经济管理学院,天津300401
出 处:《计算机工程》2024年第3期60-67,共8页Computer Engineering
基 金:国家自然科学基金(61976240)。
摘 要:间隙约束序列模式挖掘作为序列模式挖掘的一个重要分支,可以发现模式在序列中的重复出现。然而,当前研究主要针对单项序列进行挖掘,并且序列中每一项都被认为具有相同意义。为解决该问题,提出一次性弱间隙序列模式挖掘(OWP)算法,该算法由准备阶段、支持度计算和候选模式生成3个步骤组成。在准备阶段,建立倒排索引,并对不频繁的项进行剪枝;在支持度计算方面,利用倒排索引结构记录出现位置,避免对原始数据集的重复扫描;在候选模式生成方面,采用模式连接策略,减少冗余候选模式的生成。在项集序列和单项序列共6个真实数据集上的实验结果表明,OWP算法相比OWP-p、Ows-OWP和OWP-e算法在运行时间上分别提升了2.653、1.348、3.592倍,在内存消耗上分别减少了3.51%、0.07%、5%,说明OWP算法可以更高效地挖掘出用户感兴趣的模式。此外,OWP算法在以D1数据集为基础的6倍大小的数据集上的运行时间比D1数据集增长了3.763倍,内存消耗增长了2.310倍,运行时间和内存消耗的增加倍数均小于数据集大小的增加倍数,说明OWP算法具有良好的可扩展性。Sequential Pattern Mining(SPM)with gap constraint,as an important branch of sequential pattern mining,can discover the repetitive occurrences of patterns in sequences.However,the current studies mainly focus on datasets with items,and each item in the sequence is considered to have the same effects.To address this problem,the One-off Weak gap sequential Pattern mining(OWP)algorithm is proposed,which includes three steps:preparation stage,support calculation,and candidate pattern generation.In the preparation stage,an inverted index is constructed and infrequent items are pruned.For support calculation,the occurrence positions are recorded using an inverted index,which avoids repeated scanning of the original dataset.For candidate pattern generation,a pattern-join strategy is used to reduce the generation of redundant candidate patterns.Finally,from the experimental results on six real datasets comprising datasets with items and itemsets,it is observed that the OWP algorithm improves the runtime by 2.653,1.348,and 3.592 times compared with the OWP-p,Ows-OWP,and OWP-e algorithms,respectively,and as concerns memory usage,it reduces by 3.51%,0.07%,and 5%,respectively.This indicates that the OWP algorithm can mine the patterns of user interest more efficiently.In addition,the OWP algorithm on the dataset with six times the size based on D1 increases by 3.763 times and memory usage increases by 2.310 times compared to that of D1.Clearly,both the increases in runtime and memory usage are smaller than those in dataset size,implying that the OWP algorithm has good scalability.
关 键 词:序列模式挖掘 项集挖掘 间隙约束 一次性条件 弱间隙约束
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229