一种基于大项集重用的序列模式挖掘算法  被引量:10

A Sequential Pattern Mining Algorithm Based on Large-Itemset Reuse

在线阅读下载全文

作  者:宋世杰[1] 胡华平[1] 周嘉伟[1] 金士尧[1] 

机构地区:[1]国防科学技术大学计算机学院,长沙410073

出  处:《计算机研究与发展》2006年第1期68-74,共7页Journal of Computer Research and Development

基  金:国家自然科学基金项目(60573136);国家"八六三"高技术研究发展计划基金项目(2003AA142010)

摘  要:在重新定义序列模式的长度、增加了序列模式的挖掘粒度的基础上,提出一种基于大项集重用的序列模式挖掘算法HVSM·该算法采用垂直位图法表示数据库,先横向扩展项集,将挖掘出的所有大项集组成一大序列项集,再纵向扩展序列,将每个一大序列项集作为“集成块”,在挖掘k大序列时重用大项集·并以兄弟节点为种子生成候选大序列,利用1st-TID对支持度进行计数·实验表明,对于大规模事务数据库,该算法有效地提高了挖掘效率·A first-horizontally-last-vertically scanning database sequential pattern mining algorithm (HVSM) based on large-itemset reuse is presented in this paper. The algorithm redefines the length of sequential pattern, which increases the granularity of mining sequential pattern. While considering a database as a vertical bitmap, the algorithm first extends the itemset horizontally, and digs out all the large-itemsets which are called one-large-sequence itemset. Then the algorithm extends the sequence vertically, and takes each one-large-sequence itemset as a "container" for mining k-large-sequence, and generates candidate large sequence by means of taking brother-nodes as child-nodes, and counts the support by recording the 1st-TID. The experiments show that the HVSM can find out frequent sequences faster than the SPAM algorithm for mining the medium-sized and large transaction databases.

关 键 词:序列模式挖掘 位图表示法 项集扩展 序列扩展 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象