检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王婕婷 李飞江 李珏 钱宇华 梁吉业[2] Jieting WANG;Feijiang LI;Jue LI;Yuhua QIAN;Jiye LIANG(Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
机构地区:[1]山西大学大数据科学与产业研究院,太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,太原030006
出 处:《中国科学:信息科学》2024年第1期159-190,共32页Scientia Sinica(Informationis)
基 金:科技创新2030—重大项目(批准号:2021ZD0112400);国家自然科学基金重点项目(批准号:62136005);国家自然科学基金青年基金(批准号:62106132,62306170);山西省科技重大专项(批准号:202201020101006);山西省基础研究计划(批准号:20210302124271,202103021223026)资助项目。
摘 要:决策树模型具有较强的可解释性,是随机森林、深度森林等机器学习方法的基础.如何选择节点的分割属性与分割值是决策树算法的关键问题,对树的泛化能力、深度、平衡程度等重要性能产生影响.传统属性选择准则的定义大多基于凹函数,使得决策树算法存在多值偏向问题,即倾向于选择取值种类多的属性作为节点分割属性.已有研究表明缓解随机一致性的评价准则能够降低分类偏差与类簇个数偏向.本文将基于标准化框架缓解基尼指数的随机一致性,以此缓解其多值偏向问题.通过人造数据集验证,标准基尼指数能够缓解基尼指数的多值偏向问题,并且选择出具有决策信息的属性.通过12个基准数据集与两个图像数据集的实验验证,基于标准基尼指数的决策树算法比现有缓解多值偏向的决策树算法具有较高的泛化性能.The decision tree model has strong interpretability and is the basis of machine learning methods such as random forest and deep forest.Selecting the segmentation attribute and segmentation value of nodes is the core problem of the decision tree method,which has an impact on the generalization ability,depth,balance degree,and other important performance aspects of the tree.Most of the traditional node selection attribute criteria are defined based on the sum of concave functions,which makes the decision tree algorithm have the problem of multivalue bias;that is,it tends to select the attribute with many values as the node segmentation attribute.In the classification task,the performance evaluation method from the perspective of random consistency was verified to have a low classification bias.The evaluation criterion that alleviates random consistency can reduce classification bias and cluster number bias.In this paper,the random consistency of the Gini index is alleviated based on the standard framework to offset its multivalue bias.It is verified by artificial data sets that the standard Gini index can alleviate the multivalue bias problem of the Gini index and select the attributes with decision information.Experimental results on twelve benchmark datasets and two image data sets show that the decision tree based on the pure Gini index has higher generalization performance than the existing decision tree algorithms to mitigate multivalue bias.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.91