FS-CRF:基于特征切分与级联随机森林的异常点检测模型被引量：2

FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest

作　　者：刘振鹏苏楠秦益文卢家欢李小菲 LIU Zhen-peng;SU Nan;QIN Yi-wen;LU Jia-huan;LI Xiao-fei(School of Cyber Security and Computer,Hebei University,Baoding,Hebei 071002,China;Information Technology Center,Hebei University,Baoding,Hebei 071002,China;School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)

机构地区：[1]河北大学网络空间安全与计算机学院,河北保定071002 [2]河北大学信息技术中心,河北保定071002 [3]兰州交通大学电子与信息工程学院,兰州730070

出　　处：《计算机科学》2020年第8期185-188,共4页Computer Science

基　　金：河北省自然科学基金(F2019201427);教育部“云数融合科教创新”基金(2017A20004)。

摘　　要：大数据时代,攻击篡改、设备故障、人为造假等原因导致海量数据中潜藏着许多异常值。准确地检测出数据中的异常点,实现数据清洗,至关重要。文中提出一种结合特征切分与多层级联随机森林的异常点检测模型(outlier detection model based on Feature Segmentation and Cascaded Random Forest,FS-CRF)。利用滑动窗口与随机森林对原始特征进行细粒度切分,生成类概率向量,用于训练多层级联的随机森林;由级联层中最后一层的随机森林投票决定样本的最终类别。仿真实验结果表明,新方法在基于多个UCI数据集进行的异常分类任务中均获得较高F1-measure评分;级联结构使新模型相比于经典的随机森林算法进一步提高了泛化能力;在高维数据集上所提方法比梯度提升决策树和XGBoost拥有更优的性能,且超参数较少,易于调优,具有更好的综合性能。In the era of big data,there are many abnormal values hidden in massive data due to attack tampering,equipment fai-lure,artificial fraud and other reasons.Accurately detect outliers in data is critical to data cleaning.Therefore,an outlier detection model combining feature segmentation and multi-level cascaded random forest(FS-CRF)is proposed.Using the sliding window and the random forest to segment the original features,the generated class probability vector is used to train the multi-level cascaded random forest.Finally,the category of the sample is determined by the vote of the last layer.Simulation experiment results show that the new method can effectively detect outlier in classification tasks on UCI data sets,with high F1-measure scores obtained on both high and low dimensional data sets.The cascade structure further improves the generalization ability of the model compared to the classical random forest.Compared with the GBDT and XGBoost,the proposed method has performance advantages on high-dimensional data sets,and has fewer hyper-parameters that easy to tune and has better comprehensive performance.

关键词：数据清洗细粒度特征级联随机森林集成学习异常点检测

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

FS-CRF:基于特征切分与级联随机森林的异常点检测模型被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

FS-CRF:基于特征切分与级联随机森林的异常点检测模型 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

FS-CRF:基于特征切分与级联随机森林的异常点检测模型被引量：2