一种基于概念重复性的数据流集成分类算法  被引量:2

Ensemble classification algorithm for data stream based on repeatability of concept

在线阅读下载全文

作  者:尹绍宏[1] 张盼盼[1] 

机构地区:[1]天津工业大学,天津300387

出  处:《计算机工程与应用》2016年第12期80-84,共5页Computer Engineering and Applications

摘  要:目前关于概念漂移数据流的分类研究已经取得了许多成果,但大部分没有充分考虑到数据流中概念重复出现的情况,这将耗费大量的计算和内存资源,增加了分类错误的可能性。为此,基于概念的重复性提出了一种数据流集成分类算法,该算法运用集成分类思想处理数据流中的概念漂移,但在学习过程中不会将暂时失效的概念及对应基分类器删除,而是把它们的基本信息存储起来,方便以后调用,并可根据概念间的转换关系预测即将到来的概念,在提高分类精度的同时又提高了时间效率。实验结果验证了算法的有效性。Nowadays, the data stream classification research about concept drift has gained a lot of achievements. However, because of neglecting of the situation that concepts recur in the data steam, most of research methods will not only lead to high computation complexity and large memory overhead, but affect the classification accuracy. To solve this problem, based on the repeatability of concept, this paper proposes an ensemble classification algorithm for data stream, which applies ensemble classification theory to process the concept drift in data stream. On the one hand, the algorithm stores the essential information of temporary failure concepts and their corresponding base classifiers for later calls instead of deleting them during the learning process. On the other hand, it predicts the oncoming concept according to transitions between concepts. Therefore, the proposed algorithm can improve the classification accuracy and efficiency. Finally, the experimental results demonstrate the effectiveness of the new algorithm.

关 键 词:数据挖掘 数据流 集成分类 概念漂移 重复性 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象