检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张扬武 ZHANG Yang-wu (Department of Teachingfor Science and Technology, China University of Political Science and Law, Beijing 102249, China)
出 处:《电脑知识与技术》2016年第6期202-204,共3页Computer Knowledge and Technology
摘 要:C4.5算法采用信息增益率来构造决策树,克服了选择较多值的属性的趋向,具有处理连续属性的能力。在处理大数据集时,表现出效率较低,忽略样本集中的不同样本与测试数据的距离差异。该文提出了一种基于训练集局部加权的C4.5改进算法,根据欧式距离或汉明距离来定义样本的权值,将权值更新到训练集中,重新计算的信息增益率反映了训练样本的差异对测试数据的影响,并且在处理大数据集时,根据权值排序和设置的阈值简化数据集,降低了计算复杂度,提高效率。C4.5 algorithm uses information gain-ratio to construct a decision tree, and overcome the tendency to select the attri- bute onmore values, with the ability to handle continuous attributes.But it showless efficient when dealing with large data sets and ignoring the differences of distance from the sample set and test data set.Based on training set weighted locally, This paper proposes a suite of algorithm of improvement for C4.5algorithm.The sample weights ,which are defined according to the Euclid- ean distance or Hamming distance, update to the training set.On this basis, information gain-ratio which is recalculated reflects the impact of the differences of distance from the sample set and test data set.Therefore, the proposed algorithm can reduces the computational complexity and improves efficiencywhen dealing with large data sets,using the simplifiedsample set based on- weight sorting and the threshold.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222