基于项目序列集操作的关联规则挖掘算法  被引量:37

Mining of Association Rules Based on the Operators of Set of Item Sequences

在线阅读下载全文

作  者:毛国君[1] 刘椿年[1] 

机构地区:[1]北京工业大学计算机学院

出  处:《计算机学报》2002年第4期417-422,共6页Chinese Journal of Computers

基  金:国家自然科学基金 (60 173 0 14 );北京市自然科学基金(4 0 2 2 0 0 3 );北京市教委资金资助

摘  要:最大频繁项目序列集的生成是影响关联规则挖掘的关键问题 ,传统的算法是通过对事务数据库的多次扫描实现的 .最新的研究已经开始通过减少事务数据库的扫描次数进而减少挖掘过程的 I/ O代价来获得更高的效率 .随着计算机性能的提高 ,探索合适的数据结构来支持基于一次事务数据库扫描的高效算法成为可能 .该文首先给出项目序列集和它的基本操作的严格定义 ,然后在此基础上提出一个称为 ISS- DM的最大频繁项目序列集生成算法 .ISS- DM算法是通过对事务数据库的一次扫描而逐步演化成最大频繁项目序列集的 .Discovering the frequent set of item sequences in a transaction database is one of the most important tasks in mining association rules. Many algorithms have been proposed in the literatures, but most of them are based on Apriori method: pruning the itemset lattice, which need iterations to the transaction database. Recent algorithms attempted to improve the mining efficiency by reducing the number of database passes to control I/O cost. In this paper, we first define Set of Item Sequences and its basic properties, then create some operators which aim at the mining of association rules. Let ISS 1 and ISS 2 be the two variables of set of item sequences, and IS be a variable of item sequence, then the main operators are defined as follows: (1) IS ∈ sub ISS 1  IS 1 ∈ ISS 1 , have IS  IS 1 ;(2) ISS 1  sub ISS 2  IS 1 ∈ ISS 1 , have IS 1 ∈ sub ISS 2 ;(3) ISS 1 ∩ sub ISS 2 ={ IS |IS∈ sub ISS 1 and IS ∈ sub ISS 2 };(4) ISS 1 ∪ sub ISS 2 ={ IS|IS ∈ sub ISS 1 or IS ∈ sub ISS 2 }. Based on these definitions, we propose a new efficient algorithm called ISS DM which can avoid repeatedly scanning the transaction database for mining association rules. Unlike existing algorithms which are based on the pruning the itemset lattice or its improved methods, our algorithm only makes use of the two linear data structures in the memory( ISS and ISS * ), and it can obtain higher mining efficiency with less storage than other algorithms in some cases. Finally the effectiveness of this algorithm is analyzed and some experimental results are given. The experiments show that ISS DM algorithm is efficient in transaction databases of moderate size, and for some particular large databases.

关 键 词:数据挖掘 关联规则 项目序列集 频繁项目序列集 算法 数据库 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象