检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杜恒[1] 杨俊成[1] Du Heng;Yang Juncheng(School of Electronic Information Engineering,Henan Polytechnic Institute,Nanyang 473000,Henan,China)
出 处:《计算机应用与软件》2019年第12期273-281,共9页Computer Applications and Software
基 金:教育部“云数融合科教创新”基金项目(2018A10004);河南省高等学校青年骨干教师培养计划基金项目(2018GGJS230)
摘 要:实时数据流中标记样本所占比例较小,并且存在大量的噪声数据和冗余数据,导致数据流的实时分类准确率较低。针对这种情况,提出基于拉普拉斯回归主动学习的大数据流分类算法。为分类器设计相对支持度差异函数作为分类的决策方法,通过阈值判断当前数据流的标记样本量。设计基于约束规则的半监督主动学习算法,从无标记样本集选择信息量最丰富的样本。采用拉普拉斯正则最小二乘回归模型作为半监督学习的回归模型,迭代地扩展数据流的标记样本量。仿真结果表明,该算法有效地提高了数据流的分类准确率,并且满足实时性的需求。In the real-time data stream,the proportion of labeled samples is low,and there is a large amount of noise data and redundant data.It results in the low accuracy of real-time classification.In view of this,we propose a classification algorithm for big data stream based on the Laplacian regression active learning.As a decision-making method,the relative support difference function was designed for the classifier,and the threshold value was used to judge the labeled sample size of current data stream.We designed a semi-supervised active learning algorithm based on the constraint rules,which selected the most informative samples from unlabeled samples and adopted Laplacian regularized least squares regression model as semi-supervised learning regression model so as to iteratively expand the labeled samples of data streams.Simulation experimental results show that the proposed algorithm effectively improves the classification accuracy of data streams,and satisfies the demands of real time processing.
关 键 词:大数据 实时数据流 拉普拉斯正则最小二乘 分类算法 半监督学习 主动学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222