检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李瑞峰 杨海峰[1] 蔡江辉[1] 荀亚玲[1] 周永祥 LI Rui-feng;YANG Hai-feng;CAI Jiang-hui;XUN Ya-ling;ZHOU Yong-xiang(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)
机构地区:[1]太原科技大学计算机科学与技术学院,太原030024
出 处:《小型微型计算机系统》2022年第7期1426-1431,共6页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61602335)资助;山西省重点研发项目(201803D121059,201903D121116)资助.
摘 要:深度森林是一种有效的机器学习方法,但在级联森林模块中,森林中子树的特征选择随机性较大,使用传统的平均值法可能导致森林的预测概率存在一定误差,从而影响整个算法性能.针对以上问题,提出了一种基于加权深度森林离群数据挖掘算法(Weight Deep Forest,WDF).首先,通过森林的预测概率定义权重因子μ,描述当前层森林准确率大小;其次,在级联森林模块的构建过程中,把权重因子μ作为级联层中每个森林的权重,从而降低森林中根节点特征的随机选择对算法性能的影响;根据数据样本分布的不同,通过计算其类密度重新定义了局部孤立因子α,描述数据离群程度大小;最后利用UCI数据集以及LAMOST光谱数据对算法进行验证,结果表明该算法与同类算法相比在离群点检测方面具有更高的挖掘质量.Deep forest is an effective machine learning method,however,in the cascading forest module,the feature selection of sub-trees in the forest is more random,using the traditional average method may cause error in the forecast probability of the forest,which will affect the performance of the entire algorithm.To solve the above problems,an outlier data mining algorithm based on weighted deep forest is proposed(Weight Deep Forest,WDF).First,the weight factorμis defined by the forecast probability of the forest to describe the accuracy of the current forest layer;Second,in the process of constructing the cascade forest module,the weight factorμis used as the weight of each forest in the cascade layer,thereby reducing the impact of the random selection of the root node characteristics in the forest on the performance of the algorithm;According to the different distribution of data samples,the degree of data outlier is described by calculating its class density and redefining the local isolation factorα.Finally,UCI data set and LAMOST spectral data are used to verify the algorithm,the results show that this algorithm has higher mining quality in outlier detection compared with similar algorithms.
关 键 词:深度森林 级联森林 权重因子 孤立因子 离群挖掘
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171