检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]上海大学机电工程与自动化学院,上海200072
出 处:《计算机技术与发展》2015年第1期29-32,共4页Computer Technology and Development
基 金:上海市科技计划基金(097;007;14000)
摘 要:决策树分类方法是解决数据挖掘、模式识别中分类任务的有效方法,然而,在大规模的数据集上运行时,其运行效率受到严重影响。文中选取决策树的代表算法C4.5算法为研究对象,利用算法固有的并行性对其进行优化研究。文中利用MATLAB实现串行的C4.5决策树,并对构成该决策树的子函数进行运行时间分析,从而确定信息增益率计算的复杂性为限制算法速度的关键因素。针对此计算瓶颈,结合决策树算法在子节点分割以及最优分裂属性选择等方面的并行性,纵向划分数据,构建了并行的C4.5决策树,并利用MATLAB并行计算池功能以及SPMD设计实现。对并行后决策树运行时间验证结果表明,将C4.5决策树并行化后,并行决策树的构建时间显著缩短,实现了算法的加速。Decision tree is an effective classification algorithm in data mining and pattern recognition. However,its performance is largely affected with size increasing of data sets. It selects the C4. 5,the representative decision tree algorithm,as the research object,using the in-herent parallelization of the algorithm for optimization study. Serial C4. 5 decision tree algorithm is first implemented on MATLAB and the running time of the sub-functions is analyzed so as to find out the bottleneck of the algorithm. To solve the problem of spending too much time on computing information gain ratio,a parallel C4. 5 decision is established with longitudinal partition of data in selecting the best spit attribute,and design and realize it by MATLAB parallel computing pool function and SPMD. Test results for decision tree run-ning time show that generation time of parallel decision tree can be largely reduced after parallelizing the C4. 5 decision tree,realizing the algorithm acceleration.
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.149.185