检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西南石油大学计算机科学学院,成都610500
出 处:《计算机科学》2017年第B11期98-101,共4页Computer Science
摘 要:针对不平衡数据集的有效分类问题,提出一种结合代价敏感学习和随机森林算法的分类器。首先提出了一种新型不纯度度量,该度量不仅考虑了决策树的总代价,还考虑了同一节点对于不同样本的代价差异;其次,执行随机森林算法,对数据集作K次抽样,构建K个基础分类器;然后,基于提出的不纯度度量,通过分类回归树(CART)算法来构建决策树,从而形成决策树森林;最后,随机森林通过投票机制做出数据分类决策。在UCI数据库上进行实验,与传统随机森林和现有的代价敏感随机森林分类器相比,该分类器在分类精度、AUC面积和Kappa系数这3种性能度量上都具有良好的表现。For the problem of effective classification on imbalanced data sets,a classifier combining cost-sensitive learning and random forest algorithm is proposed.Firstly,a new impurity measure is proposed,taking into account not only the total cost of the decision tree,but also the cost difference of the same node for different samples.Then,the random forest algorithm is executed,K times sampling for the data set is performed,and K basic classifiers are built.Then,the decision tree is constructed by the classification regression tree (CART) algorithm based on the proposed impurity measure,so as to form the decision tree forest.Finally,the random forest algorithm makes the data classification decision by voting mechanism.In the UCI database,compared with the traditional random forest and the existing cost-sensitive random forest classifier,this classifier has good performance in the classification accuracy,AUC area and Kappa coefficient.
关 键 词:代价敏感学习 随机森林 不纯度度量 分类回归树(CART) 不平衡数据
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222