检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:武炜杰 张景祥[1] WU Weijie;ZHANG Jingxiang(School of Science,Jiangnan University,Wuxi,Jiangsu 214122,China)
出 处:《计算机科学与探索》2021年第1期132-140,共9页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金(61772239,11804123)。
摘 要:针对有新类的动态数据流分类算法检测新类性能不高的问题,提出一种基于k近邻的完全随机森林算法(KCRForest)。该算法利用动态数据流中已知类样本构建完全随机森林的完全随机树,并根据叶节点平均路径长度将样本空间分成正常区域与异常区域。通过落入异常区域中样本的k近邻计算该样本离群值。若样本离群值大于设定阈值,则判断样本为新类,否则为已知类。落入异常区域的已知类样本由该样本的k近邻得到样本标签分布,否则取该区域中原训练样本标签分布,投票得到样本标签。当新类样本检测达到一定数量时,利用新类样本信息更新模型,便于检测其他新类。为了验证KCRForest算法检测新类的有效性,分别在4个UCI数据集上进行实验,并与已有算法进行比较。结果表明该算法的新类检测性能优于或与iForest+SVM算法、LOF+SVM算法相当,分类准确率明显高于SENCForest算法。Aiming at the low performance in detecting new class of classification algorithm on dynamic data stream with new class,a completely randomized forest algorithm based on k-nearest neighbor(KCRForest)is proposed.The algorithm constructs completely randomized trees of completely randomized forest by only known-class samples in dynamic data stream,and divides the sample space into normal or abnormal region according to the average path length of leaf nodes.The outlier of a sample is obtained based on its k-nearest neighbor,when the sample falls into abnormal region.If the outlier is greater than the set threshold,the sample is judged to be new-class.Otherwise it is judged to be known-class.When the known-class sample falls into abnormal region,class distribution is obtained based on its k-nearest neighbor.Otherwise class distribution can be obtained during training period.The label of known-class sample is identified by voting.When a certain number of new class samples are detected,the model is updated by the new-class sample information to detect other new classes.In order to verify the effectiveness of KCRForest algorithm in detecting new classes,experiments are carried out on 4 UCI datasets respectively,and comparisons are made with existing algorithms.The results show that the proposed method is equivalent to or better than iForest+SVM and LOF+SVM on new-class detection,and its classification accuracy is better than SENCForest.
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.227.21.218