检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高虹雷 门昌骞[1] 王文剑[1,2] GAO Hong-lei;MEN Chang-qian;WANG Wen-jian(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,太原030006
出 处:《小型微型计算机系统》2021年第6期1136-1143,共8页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(62076154,61673249,U1805263)资助;山西省国际科技合作重点研发计划项目(201903D421050)资助;山西省自然科学基金项目(201901D111030)资助.
摘 要:目前对决策树(Decision Tree,DT)分类问题的相关研究已取得了很多成果,但仍存在一些问题,如决策树在寻找最优切分点时需要遍历特征的所有取值,当数据集规模较大时,递归构建决策树所需时间将会很长,因此在保证分类精度的前提下加速决策树的构建具有重要意义.本文首先根据数据的不同分布,给出两种特征值区间的分割方法,即等精度特征值区间划分和变精度特征值区间划分,然后计算各选定区间的基尼指数,寻找最优特征及最优切分点,最后递归生成模型决策树.实验表明,算法在构造决策树时可有效减小计算代价,在保证分类精度的同时加速决策树的构造,且在一定程度上能够避免过拟合现象的发生.Up to now,many results have been achieved in the related research on the classification problems for decision tree(DT).But there are still some limitations,such as the optimal segmentation point needs to traverse all values for a feature,the recursive time for constructing a decision tree will be very long when the dataset is large.Therefore,it is of great significance to accelerate the construction of decision tree on the premise of ensuring classification accuracy.This paper firstly presents two methods to segment the interval according to different data distribution,which are named as equal precision feature value interval partition and variable precision feature value interval partition.Then the Gini index of each selected interval is calculated to find the optimal feature and optimal segmentation point,and the model decision tree will be generated finally.The experiment results show that the algorithm can reduce the computational cost effectively,and the construction of decision tree can be accelerated while the high classification accuracy may be achieved.Moreover,overfitting will be avoided to some extent.
关 键 词:决策树 基尼指数 模型决策树 等精度特征值区间划分 变精度特征值区间划分
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30