检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:童咏昕[1] 张媛媛[2] 袁玫[3] 马世龙[1] 余丹[1] 赵莉[1]
机构地区:[1]北京航空航天大学软件开发环境国家重点实验室,北京100191 [2]电信科学技术研究院,北京100191 [3]北京联合大学信息学院,北京100084
出 处:《计算机研究与发展》2010年第1期72-80,共9页Journal of Computer Research and Development
基 金:国家"九七三"重点基础研究发展计划基金项目(2005CB321902);北京市教委科技计划基金项目(KM200911417003)
摘 要:从序列数据库中挖掘频繁序列模式是数据挖掘领域的一个中心研究主题,而且该领域已经提出和研究了各种有效的序列模式挖掘算法.由于在挖掘过程中会产生大量的频繁序列模式,最近许多研究者已经不再聚焦于序列模式挖掘算法的效率,而更关注于如何让用户更容易地理解序列模式的结果集.受压缩频繁项集思想的启发,提出了一种CFSP(compressing frequent sequential patterns)算法,其可挖掘出少量有代表性的序列模式来表达全部频繁序列模式的信息,并且清除了大量的冗余序列模式.CFSP是一种two-steps的算法:在第1步,其获得了全部闭序列模式作为有代表性序列模式的候选集,与此同时还得到大多数的有代表性模式;在第2步,该算法只花费了少量的时间去发现剩余的有代表性序列模式.一个采用真实数据集与模拟数据集的实验研究也证明了CFSP算法具有高效性.Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient algorithms for mining sequential patterns have been proposed and studied. Recently,many researchers have not focused on the efficiency of sequential patterns mining algorithms,but have paid attention to how to make users understand the result set of sequential patterns easily,due to the huge number of frequent sequential patterns generated by the mining process. In this paper,the problem of compressing frequent sequential patterns is studied. Inspired by the ideas of compressing frequent itemsets,an algorithm,CFSP (compressing frequent sequential patterns),is developed to mine a few representative sequential patterns to express all the information of all frequent sequential patterns and eliminate a large number of redundant sequential patterns. The CFSP adopts a two-steps approach: in the first step,all closed sequential patterns as the candidate set of representative sequential patterns are obtained,and at the same time most of the representative sequential patterns are obtained;in the second step,finding the remaining representative sequential patterns takes only a little time. An empirical study with both real and synthetic data sets proves that the CFSP has good performance.
关 键 词:挖掘序列模式 压缩 频繁模式挖掘 关联规则 数据挖掘
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229