检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李春生[1] 焦海涛 刘澎[1] 刘小刚 LI Chun-sheng;JIAO Hai-tao;LIU Peng;LIU Xiao-gang(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)
机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318
出 处:《计算机技术与发展》2020年第5期185-189,共5页Computer Technology and Development
基 金:国家自然科学基金面上项目(51774090);黑龙江省自然科学基金面上项目(F2015020)。
摘 要:决策树算法是在已知具有不同特征的样本数据出现的概率基础上,构建决策树来进行数据分析的一种算法。在数据分类算法中,决策树算法是一种经典的分类决策算法。首先,将所有的数据特征看作是各个树的节点,遍历所有特征,其中每当遍历到其中某个特征时,对特征进行分割处理,并记录分割点的数据信息,作为划分子节点的纯度依据。其次,比较记录的数据特征以及判定最优特征,寻找最优划分方式,对样本数据集进行分割操作。最后,构建符合规则的决策树。针对传统的决策树C4.5算法计算信息增益率时间过长的问题,提出了一种改进的K-C4.5算法,引用麦克劳林公式和泰勒公式的思想,将信息增益率计算公式从对数函数转化为非对数函数,从而降低运算的时间效率。以实际数据集进行测试,验证了改进后的算法具有一定的效果。The decision tree algorithm is an algorithm to construct a decision tree for data analysis based on the probability of occurrence of sample data with different characteristics. In the data classification algorithm,the decision tree algorithm is a classic classification decision algorithm. First,all data features are treated as nodes of each tree,and all features are traversed. Whenever one of the features is traversed,the feature is segmented and the data of the segmentation point is recorded as the sub-node purity basis. Secondly,the recorded data features is compared and the optimal features is determined,and the optimal partitioning method is found to perform the segmentation operation on the sample dataset. Finally,a decision tree that conforms to the rules is built. In this paper,the problem of calculating the information gain rate is too long for the traditional decision tree C4.5 algorithm. An improved K-C4.5 algorithm is proposed,which uses the ideas of the McLaughlin formula and the Taylor formula to calculate the information gain rate. From the logarithmic function to the non-logarithmic function,the time efficiency of the operation is reduced. The actual data set is tested to verify that the improved algorithm has certain effects.
关 键 词:决策树 数据概率 信息增益率 时间效率 改进算法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249