基于聚簇模型重用的概念漂移数据流半监督分类算法被引量：1

Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse

作　　者：康伟黎利辉文益民[1] KANG Wei;LI Lihui;WEN Yimin(Guangxi Key Laboratory of Image and Graphic Intelligent Processing,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)

机构地区：[1]广西图像图形与智能处理重点实验室(桂林电子科技大学),广西桂林541004

出　　处：《计算机科学》2024年第4期124-131,共8页Computer Science

基　　金：广西重点研发计划(桂科AB21220023);国家自然科学基金(62366011);广西图像图形与智能处理重点实验室项目(GIIP2306)。

摘　　要：带概念漂移的半监督数据流分类任务中,仅有少部分的数据被标记,这给分类器的训练、概念漂移的检测以及分类器对新概念的适应带来了巨大的挑战。现有的半监督聚簇分类算法仅对分类器池中的聚簇模型进行简单的增量更新,未能有效重用历史聚簇模型。因此,文中提出了一种新的聚簇模型重用的半监督分类算法,称为CDCMR。首先,数据流以数据块的形式到来,对数据块分完类后,训练一个簇数自适应确定的聚簇模型。其次,通过计算分类器池中的各组件分类器与聚簇模型之间的相似度,挑选多个组件分类器。再次,用当前数据块对挑选出来的组件分类器进行模型重用后,与聚簇模型集成。然后,将分类器池划分为新旧更替和多样性最大化分类器池进行更新。最后,对下一个数据块的样本进行集成分类。在多个人工和真实数据集上进行实验,结果表明,所提算法1)能有效适应概念漂移,与现有方法相比其性能有显著性提升。Semi-supervised classification of data stream with concept drift poses challenges to classifier training,classifier adaption for new concept,and concept drifting detection,for only some or even very few instances are labeled.In the existing semi-supervised clustering classification algorithms,only the clustering model in the classifier pool is updated incrementally,and the historical clustering model cannot be reused effectively.Therefore,this paper proposes a new cluster-based model reuse semi-supervised classification algorithm,CDCMR.First,the data stream comes in the form of data chunks.After classifying the data chunks,a clustering model with adaptive determination of the number of clusters is trained.Secondly,multiple history classifiers are selected by calculating the similarity between each history classifier in the classifier pool and the clustering model.Thirdly,the selected history classifier is reused with the current data chunk and integrated with the cluster model.Then,the classifier pool is divided into old and new replacement and diversity maximization classifier pool for updating.Finally,the samples of the next data chunk are ensemble classification.Experimental results on several artificial and real data sets show that the algorithm can effectively adapt to concept drift,which is significantly improved compared with the existing methods.

关键词：数据流半监督学习概念漂移聚簇模型重用集成学习

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于聚簇模型重用的概念漂移数据流半监督分类算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于聚簇模型重用的概念漂移数据流半监督分类算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于聚簇模型重用的概念漂移数据流半监督分类算法被引量：1