结构与纯度结合的新型决策树分裂准则  

Novel Splitting Criteria for Decision Trees with Combination of Structure and Purity

在线阅读下载全文

作  者:杜斐 陈松灿[1,2] DU Fei;CHEN Songcan(College of Computer Science and Technology,Nanjing University of Aeronautics&Astronautics,Nanjing 211106,China;MIIT Key Laboratory of Pattern Analysis and Machine Intelligence,Nanjing 211106,China)

机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106 [2]工信部模式分析和机器智能重点实验室,南京211106

出  处:《南京航空航天大学学报》2023年第3期534-543,共10页Journal of Nanjing University of Aeronautics & Astronautics

基  金:国家自然科学基金(62076124)。

摘  要:决策树(Desision tree,DT)生长关键步骤的分裂或分叉准则通常根据纯度和误分类误差等实现,分裂生长分为轴平行和非轴平行方式。这些分裂准则一般与数据内在结构(如类别是否是多簇或单簇组成)无关。为了弥补这一缺失,本文提出了两种混合分裂准则,分别用加权和两步法将同类内的节点间距(Between-node margin within the same class,BNM)和同一节点内的类紧性(Within-class compactness and between-class separation in the same inner node,CSN)与纯度度量相结合。由于传统决策树以贪婪方式生长,仅能确定出当前的一个局部最优分裂点,为改善这个缺点,本文首先根据纯度确定出前k个候选分裂点,然后通过最大化BNM和最小化CSN确定最终的分裂点,不仅缓和了纯度上的局部最优性,而且引入了数据结构的全局性,因此能较大程度地改进后代节点的分裂,增强树的泛化性和可解释性。将上述两种分裂准则组合还可以进一步提升性能。在21个标准验证数据集上的比较结果表明:新准则下的决策树不仅提高了预测性能、降低了复杂性,而且相比于其他采用混合分裂准则的DTs更具竞争力。As a critical part of desision tree(DT)growth,its nodes can be split to grow by either axis-or nonaxis-aligned way based on such splitting criteria as purity and misclassification error.However,these have nothing to do with the geometric structure of data,e.g.multicentric data or single-center data.In order to compensate for this,two splitting criteria are proposed by combining the between-class margin in the same inner node(BCM)and the between-node margin within the same class(BNM)respectively with the purity measure in weighting and the two-step method.Unlike traditional greedy growth of DT which only finds the current locally optimal splitting point,the proposed method first selects the top-k purity splitting nodes,then determines the optimal one by maximizing BCM and minimizing BNM.Since not only alleviating purity-based local optimality but also considering global structures of the data,our method greatly improves the division of descendant nodes and generalization of the formed trees,while enhancing the interpretability.In addition,two aforementioned splitting criteria can be combined to further boost the performance.The comparison results on 21 benchmark datasets show an improvement in predictive performance of new trees with reduction in complexity,while also are competitive with many other DTs using hybrid splitting criteria.

关 键 词:决策树 分裂准则 全局结构 纯度 数据结构 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象