检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《西南师范大学学报(自然科学版)》2016年第11期160-164,共5页Journal of Southwest China Normal University(Natural Science Edition)
基 金:广西自然科学基金青年基金项目(2014GXNSFBA118283)
摘 要:决策树算法广泛应用于数据挖掘领域之中.属性选择是决策树方法挖掘效率的关键,但ID3方法和C4.5方法在选择属性时,都会产生一定程度的选择偏差.据此,该文对信息增益模型进行了改进,将多次对数运算的信息熵求取过程简化为多值求和,从而规避了属性选择出现偏差的可能性,也加快了决策树构建的执行速度.依托学生情况数据展开的实验研究表明,与经典的ID3方法相比,该文方法构建的决策树更加简洁.同时,随着数据样本数量的增大,该文方法的执行时间大为降低.Decision tree algorithm is widely used in the field of data mining. Attribute selection is the key to the efficiency of the decision tree method, but the ID3 method and the C4. 5 method will produce a certain degree of selection bias. Based on this, the information-gaining model is improved, and the information en-tropy of multiple logarithmic operations is simplified to a multi-valued sum, which can avoid the possibility of the deviation of the selection of attributes, and speed up the execution speed of decision tree. The exper-imental study based on the data of the students show that the proposed method can obtain more concise de-cision tree. At the same time, the execution time of this method is greatly reduced with the increase of the number of data samples.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.22.242.214