调和平均优化选择划分属性的决策树改进算法  被引量:1

The Improvement Decision Tree Algorithm for Harmonic Mean Optimization on Selection Attributes

在线阅读下载全文

作  者:王卓[1] 聂斌[2] 罗计根 杜建强[2] 陈爱[1] 周丽[2] WANG Zhuo;NIE Bin;LUO Jigen;DU Jianqiang;CHEN Ai;ZHOU Li(School of Software,Nanchang University,Nanchang Jiangxi 330047,China;School of Computer,Jiangxi University of Traditional Chinese Medicine,Nanchang Jiangxi 330004,China)

机构地区:[1]南昌大学软件学院,江西南昌330047 [2]江西中医药大学计算机学院,江西南昌330004

出  处:《江西师范大学学报(自然科学版)》2018年第4期384-388,共5页Journal of Jiangxi Normal University(Natural Science Edition)

基  金:国家自然科学基金(61562045;61363042);江西省自然科学基金重大项目(20152AXCB20007);江西省高校科技落地计划(LD12038);江西省教育科学"十二五"规划一般课题(15YB005);江西中医药大学自然科学基金(2013ZR0068)资助项目

摘  要:针对信息增益和信息增益率对属性取值数的偏好,提出了一种调和平均优化选择划分属性的决策树改进算法.首先计算候选划分属性的信息增益,找出信息增益高于平均水平的属性,然后分别计算这些属性的信息增益率和信息增益的调和平均值,从中筛选调和平均值最大的属性,建立分支决策,并用递归方法建立决策树.通过4份不同规模数据实验,利用信息增益、信息增益率、GINI指数以及该文提出的方法作为属性划分的标准,分别考察其准确性在训练集、测试集、10次10折交叉验证(或5次5折交叉验证),以及其平均值.实验结果表明:该方法准确性较好、运行时间较短,具有一定程度的优越性.Aiming at the preference of information gain and information gain rate for the number of attribute values,an improved decision tree algorithm is proposed to adjust the attribute of optimal selection.The basic idea of the algorithm is as follows.Firstly,the information gain of the candidate partitioning attribute is calculated to find out the attribute of the information gain higher than the average level.Then,the harmonic average of the information gain and information gain of these attributes are calculated respectively,value of the largest attribute,the establishment of branch decision.Lastly,the use of recursive method to establish decision tree.Through four experiments of different scale data,the information gain,information gain rate,GINI index and the method proposed in the paper are used as the criteria of attribute classification to examine the accuracy of the method in the training set,the test set,ten times the ten-fold cross validation(or five times the five-fold cross validation),and the three aspects of the average.The results show that the proposed method is of good accuracy and low running time,and has certain advantages.

关 键 词:决策树 信息增益率 调和平均 中医药信息 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象