检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李艳红[1,2] 李志华 郑建兴 白鹤翔[1,2] 郭鑫[1,2] LI Yanhong;LI Zhihua;ZHENG Jianxing;BAI Hexiang;GUO Xin(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
机构地区:[1]山西大学计算机与信息技术学院,山西太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,山西太原030006
出 处:《大数据》2025年第2期107-126,共20页Big Data Research
基 金:国家自然科学基金项目(No.62272286,No.41871286);山西省基础研究计划项目(No.202203021221001,No.202203021221021)。
摘 要:数据流分类是数据流挖掘的重要研究内容,其核心任务是从实时到达的数据流中快速捕获概念漂移,并及时调整分类模型。极限学习机具有训练速度快和泛化性能好的优点,然而目前基于极限学习机的数据流分类方法很少可以同时处理数据流中常见的多类非平衡、概念漂移、标签成本昂贵的问题。为此,提出了一种有限标签下的非平衡数据流分类方法。该方法定义了预测概率差值与信息熵相结合的样本预测确定性度量,提出了不确定性标签请求策略;定义了基于类不平衡比率和样本预测误差的样本重要性度量;提出了基于概念漂移指数的分类器的更新与重构机制。在6个人工数据流和3个真实数据流上的对比实验表明,本文提出方法的分类性能优于已有的6种数据流分类方法的分类性能。Data stream classification is a crucial research area within data stream mining,with the core task of swiftly capturing concept drifts from real-time incoming data stream and promptly adjusting classification models.Extreme learning machine possesses advantages such as fast training speeds and excellent generalization performance.However,existing data stream classification methods based on extreme learning machine often struggle to simultaneously address common challenges in data stream,such as multi-class imbalance,concept drift,and the expensive labeling cost.For this reason,an imbalanced data stream classification with limited labels was proposed.We defined a sample prediction certainty measure that combined the difference in predicted probabilities and information entropy.An uncertainty label request strategy was introduced.Furthermore,we defined a sample importance measure based on class imbalance ratios and sample prediction errors.We also proposed an update and reconstruction mechanism for the classifier based on the concept drift index.Comparative experiments on six synthetic data streams and three real data streams demonstrate that the proposed method outperforms six existing data stream classification methods in terms of classification performance.
关 键 词:数据流分类 多类非平衡 极限学习机 概念漂移 标签成本昂贵
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.209.202