大型数据库中的高效序列模式增量式更新算法  被引量:10

An Efficient Incremental Updating Algorithm for DiscoveringSequential Patterns in Large Database

在线阅读下载全文

作  者:邹翔[1] 张巍[1] 蔡庆生[1] 王清毅[1] 

机构地区:[1]中国科技大学计算机系,合肥230027

出  处:《南京大学学报(自然科学版)》2003年第2期165-171,共7页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金(70171052;60075015)

摘  要: 提出一种称为FIMS(fastincrementalminingofsequentialpatterns)的序列模式增量式更新算法,处理因数据库的更新而引起的序列模式的维护问题.主要思想是利用原先的序列模式挖掘结果,通过建立一个投影数据库来减少对整个数据库的扫描次数和候选序列的生成,从而提高挖掘的效率.实验结果显示在更新数据量远小于整个数据库的大小时,FIMS算法的性能优于GSP算法4~7倍.An incremental updating technique for discovering sequential patterns called FIMS (fast incremental mining of sequential patterns) is proposed in order to deal with the maintenance of discovered sequential patterns resulted from the updating of database. The main idea is to utilize the results acquired during an earlier mining process to cut down on the cost of finding new sequential patterns in the updated database. Firstly, scan the whole database which is composed of the original database and the incremental database twice and construct a projected database from the whole database. Then, mine the projected database to get all the new candidate sequential patterns. lastly, scan the whole database once to get all the new sequential patterns. Since the algorithm FIMS only needs to scan the whole database three times in all and the projected database is much smaller than the whole database, the scan of the database and the growth of candidate sequences are greatly reduced. As a result, the efficiency of mining is improved. Our experiments show that the algorithm FIMS is greatly outperforming the algorithm GSP by a factor of 4 to 7 when the amount of the updated data is only a small portion of the whole database.

关 键 词:数据库 增量式更新算法 数据挖掘 序列模式 扫描次数 侯选序列 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象