一种特征值区间划分的模型决策树加速算法  被引量:4

Acceleration Algorithm of Model Decision Tree with Feature Value Interval Partition

在线阅读下载全文

作  者:高虹雷 门昌骞[1] 王文剑[1,2] GAO Hong-lei;MEN Chang-qian;WANG Wen-jian(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)

机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,太原030006

出  处:《小型微型计算机系统》2021年第6期1136-1143,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(62076154,61673249,U1805263)资助;山西省国际科技合作重点研发计划项目(201903D421050)资助;山西省自然科学基金项目(201901D111030)资助.

摘  要:目前对决策树(Decision Tree,DT)分类问题的相关研究已取得了很多成果,但仍存在一些问题,如决策树在寻找最优切分点时需要遍历特征的所有取值,当数据集规模较大时,递归构建决策树所需时间将会很长,因此在保证分类精度的前提下加速决策树的构建具有重要意义.本文首先根据数据的不同分布,给出两种特征值区间的分割方法,即等精度特征值区间划分和变精度特征值区间划分,然后计算各选定区间的基尼指数,寻找最优特征及最优切分点,最后递归生成模型决策树.实验表明,算法在构造决策树时可有效减小计算代价,在保证分类精度的同时加速决策树的构造,且在一定程度上能够避免过拟合现象的发生.Up to now,many results have been achieved in the related research on the classification problems for decision tree(DT).But there are still some limitations,such as the optimal segmentation point needs to traverse all values for a feature,the recursive time for constructing a decision tree will be very long when the dataset is large.Therefore,it is of great significance to accelerate the construction of decision tree on the premise of ensuring classification accuracy.This paper firstly presents two methods to segment the interval according to different data distribution,which are named as equal precision feature value interval partition and variable precision feature value interval partition.Then the Gini index of each selected interval is calculated to find the optimal feature and optimal segmentation point,and the model decision tree will be generated finally.The experiment results show that the algorithm can reduce the computational cost effectively,and the construction of decision tree can be accelerated while the high classification accuracy may be achieved.Moreover,overfitting will be avoided to some extent.

关 键 词:决策树 基尼指数 模型决策树 等精度特征值区间划分 变精度特征值区间划分 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象