CBC-DS:基于频繁闭模式的数据流分类算法  被引量:3

CBC-DS:A Classification Algorithm Based on Closed Frequent Patterns for Mining Data Streams

在线阅读下载全文

作  者:敖富江[1] 王涛[2] 刘宝宏[1] 黄柯棣[1] 

机构地区:[1]国防科学技术大学机电工程与自动化学院,长沙410073 [2]国防科学技术大学计算机学院,长沙410073

出  处:《计算机研究与发展》2009年第5期779-786,共8页Journal of Computer Research and Development

基  金:国家自然科学基金项目(60573057;60704038)~~

摘  要:基于关联规则的分类算法通常根据频繁模式生成类关联规则,但频繁模式挖掘易遭受组合爆炸问题,影响算法效率.并且数据流的出现也对分类算法提出了新的挑战.相对于频繁模式,频繁闭模式的数目较少,挖掘频繁闭模式的算法通常具有较高的效率.为此,提出了一种高效的基于频繁闭模式的数据流分类算法—CBC-DS.主要贡献在于:1)提出了一种基于逆文法顺序FP-Tree的频繁闭项集单遍挖掘过程,用于挖掘类关联规则,该过程采用了一种混合项顺序搜索策略以满足数据流挖掘的单遍性需求,并采用位图技术提高效率;2)提出了"自支持度"概念,用于筛选规则以提高算法分类精度.实验表明,位图技术能够提高算法速度2倍以上,利用自支持度能够提高算法平均精度0.5%左右;最终CBC-DS算法的平均分类精度比经典算法CMAR高1%左右,并且CBC-DS算法的规则挖掘速度远快于CMAR算法.The classification algorithms based on association rules generally generate classification association rules by frequent patterns. As mining frequent patterns often suffer from the problem of combinatorial explosion, the efficiency of the algorithms is low. Moreover, the emergence of data streams has posed new challenges for classification algorithms. In contrast to frequent patterns, the number of closed frequent patterns is less, so that the efficiency of algorithms for mining closed frequent patterns is higher. A novel and efficient closed-frequent-patterns based classification algorithm, CBC-DS, is proposed for classifying data streams. The contributions are listed as follows. (1) a single-pass closed frequent itemsets mining process based on reverse lexicographic order FP-tree is introduced for mining classification association rules, which uses a kind of mixed item-ordering searching policy to satisfy the single-pass requirement of data streams and uses the bitmap technology to improve the efficiency; (2) the concept of self-support for filtering rules is proposed to improve the precision. The experimental results show that the bitmap technology can improve the efficiency of the algorithm about twice at least and the average classifying precision can be improved about 0. 5% by using self-support. Eventually, the average precision of CBC-DS is about 1% higher than that of CMAR, and CBC-DS is much faster than CMAR.

关 键 词:数据流 分类 关联规则 频繁闭模式 自支持度 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象