基于改进ID3算法的数据分类方法被引量：12

Data Classification Method Based on Improved ID3 Algorithm

作　　者：孟雅蕾周千明[1] 师红宇[1] 马楠 MENG Ya-lei;ZHOU Qian-ming;SHI Hong-yu;MA Nan(School of Computer and Science,Xi'an Polytechnic University,Xi'an Shanxi 710048,China;Changqing Oil Field of Petro China,Xi'an Shanxi 710021,China)

机构地区：[1]西安工程大学计算机科学学院,陕西西安710048 [2]中国石油长庆油田分公司物资装备处,陕西西安710021

出　　处：《计算机仿真》2022年第5期329-332,417,共5页Computer Simulation

基　　金：陕西省教育厅科研计划项目(19JK0377)。

摘　　要：为解决ID3算法在构建决策树时偏向于选择取值较多的属性为分支节点的问题,提出一种控制属性偏向程度的数据分类方法。该分类方法通过修正的信息增益和属性偏向阙确定均衡系数,利用均衡系数对ID3算法得到的信息增益进行优化,根据优化信息增益得到决策树的根节点、分支节点,对属性进行分类,构建决策树。通过实例证明该分类方法可以实现对多值偏向的控制,避免选择取值较多的属性为分支节点,提高预测的准确率和算法的效率。In order to solve the problem that ID3 algorithm prefers to select the attribute with more values as the branch node when constructing the decision tree,a data classification method to control the degree of attribute bias is proposed.Based on this classification method,we determined the equalization coefficient by the modified information gain and attribute bias threshold.Then,the equalization coefficient was used by optimizing the information gain obtained through the ID3 algorithm.Last,the root node and each branch node of the decision tree were obtained according to the optimized information gain.Examples show that the classification method can control the multi value bias,avoid selecting the attribute with more values as the branch node,and improve the accuracy of prediction and the efficiency of the algorithm.

关键词：算法属性偏向信息增益均衡系数

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进ID3算法的数据分类方法被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于改进ID3算法的数据分类方法 被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于改进ID3算法的数据分类方法被引量：12