检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈杰[1] 邬春学[1] CHEN Jie;WU Chun-xue(School of Optical Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093
出 处:《软件导刊》2018年第10期88-92,共5页Software Guide
基 金:上海市科学计划项目(16111107502;17511107203)
摘 要:针对决策树算法C4.5在处理数据挖掘分类问题中出现的算法低效以及过拟合问题,提出一种改进的TMC4.5算法。该算法主要改进了C4.5算法的分支和剪枝策略。首先,将升序排序后的属性按照边界定理,得出分割类别可能分布的切点,比较各点的信息增益和通过贝叶斯分类器得到的概率,使用条件判断确定最佳分割阈值;其次,使用简化的CCP(Cost-Complexity Pruning)方法和评价标准,对已生成决策树的子树根节点计算其表面误差率增益值和S值,从而判断是否删除决策树节点和分支。实验结果表明,用该算法生成的决策树进行分类更为精确、合理,表明TM-C4.5算法有效。Aiming at the inefficiency and over fitting problem of decision tree algorithm C4.5 in the classification of data mining problems, an improved TM C4.5 algorithm is proposed. The algorithm mainly improves the branching and pruning strategy of C4.5 algorithm. First, the ascending ordered attribute values are combined with the boundary theorem to get the cut points of the possible segmentation classifications. The information gain rate of each point and the probability obtained by the Bayesian classifier are compared, and the optimal segmentation threshold is determined according to the rules. Secondly, the simplified algorithm of CCP (Cost Complexity Pruning) and evaluation criteria were used to calculate the surface error rate gain and S val ue of the subtree root node of the generated decision tree to judge whether to delete the decision tree node and branch. The anal ysis of the experimental results shows that the classification of the decision tree made by this algorithm is more accurate and rea sonable, indicating the validity of TM C4.5 algorithm.
关 键 词:C4.5 TM -C4.5算法 CCP 贝叶斯分类器 剪枝策略 评价标准
分 类 号:TP312[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43