有限标签下的非平衡数据流分类方法  

Imbalanced data stream classification method with limited labels

作  者:李艳红[1,2] 李志华 郑建兴 白鹤翔[1,2] 郭鑫[1,2] LI Yanhong;LI Zhihua;ZHENG Jianxing;BAI Hexiang;GUO Xin(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)

机构地区:[1]山西大学计算机与信息技术学院,山西太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,山西太原030006

出  处:《大数据》2025年第2期107-126,共20页Big Data Research

基  金:国家自然科学基金项目(No.62272286,No.41871286);山西省基础研究计划项目(No.202203021221001,No.202203021221021)。

摘  要:数据流分类是数据流挖掘的重要研究内容,其核心任务是从实时到达的数据流中快速捕获概念漂移,并及时调整分类模型。极限学习机具有训练速度快和泛化性能好的优点,然而目前基于极限学习机的数据流分类方法很少可以同时处理数据流中常见的多类非平衡、概念漂移、标签成本昂贵的问题。为此,提出了一种有限标签下的非平衡数据流分类方法。该方法定义了预测概率差值与信息熵相结合的样本预测确定性度量,提出了不确定性标签请求策略;定义了基于类不平衡比率和样本预测误差的样本重要性度量;提出了基于概念漂移指数的分类器的更新与重构机制。在6个人工数据流和3个真实数据流上的对比实验表明,本文提出方法的分类性能优于已有的6种数据流分类方法的分类性能。Data stream classification is a crucial research area within data stream mining,with the core task of swiftly capturing concept drifts from real-time incoming data stream and promptly adjusting classification models.Extreme learning machine possesses advantages such as fast training speeds and excellent generalization performance.However,existing data stream classification methods based on extreme learning machine often struggle to simultaneously address common challenges in data stream,such as multi-class imbalance,concept drift,and the expensive labeling cost.For this reason,an imbalanced data stream classification with limited labels was proposed.We defined a sample prediction certainty measure that combined the difference in predicted probabilities and information entropy.An uncertainty label request strategy was introduced.Furthermore,we defined a sample importance measure based on class imbalance ratios and sample prediction errors.We also proposed an update and reconstruction mechanism for the classifier based on the concept drift index.Comparative experiments on six synthetic data streams and three real data streams demonstrate that the proposed method outperforms six existing data stream classification methods in terms of classification performance.

关 键 词:数据流分类 多类非平衡 极限学习机 概念漂移 标签成本昂贵 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象