一种基于类别不平衡数据的层次分类模型  被引量:4

A hierarchical classification model for class-imbalanced data

在线阅读下载全文

作  者:施培蓓[1] 刘贵全[2] 汪中[3] 卫兵[1] 

机构地区:[1]合肥师范学院公共计算机教学部,安徽合肥230601 [2]中国科学技术大学计算机学院,安徽合肥230027 [3]中国电子科技集团公司第三十八研究所数字技术部,安徽合肥230088

出  处:《中国科学技术大学学报》2015年第1期61-68,共8页JUSTC

基  金:国家科技支撑计划(2012BAH17B03);安徽省自然科学基金(1408085MF131);安徽省高等学校自然科学项目(KJ2013B212);合肥师范学院魂芯DSP产业化研究院开放课题资助

摘  要:传统的机器学习方法在处理类别不平衡数据时分类性能较低,为此提出一种基于类别不平衡数据的层次分类模型.层次分类模型采用AdaBoost方法为基准分类器,以分类器误报率和特征建立数学模型,并证明层次分类模型的参数可以计算得到.首先以层次分类树为结构建立模型,接着针对层次分类树的结构模型进行分类代价计算,得到模型的代价与每层特征之间的定量数学描述,然后将该分类代价转换为优化问题并给出优化问题的求解过程,同时给出层次分类模型的计算结果.在UCI数据集上进行大量测试,以AUC和F-Measure为评价标准,相比于现有的不平衡分类方法,层次分类模型具有更优的分类性能.Traditional machine learning methods have lower classification performance when dealing with class imbalanced data. A hierarchical classification model for class imbalanced data was thus proposed. With an AdaBoost classifier as its basis classifier, the model builds mathematical models by the features and false positive rates of the classifier, and demonstrates that parameters of the hierarchical classification model could be calculated. First, the hierarchical classification tree was as the structure, and then the classification cost of the hierarchical classification tree mode was obtained as well as a quantitative and mathematical description of the features of each layer. Finally, the classification cost could be converted to a optimization problem, and the solving process of the optimization problem was given. Meanwhile,results of the hierarchical classification are presented. Experiments have been conducted on UCI dataset, and the results show that the proposed method has higher AUC and F-measure compared to many existing class-imbalanced learning methods.

关 键 词:机器学习 类别不平衡 层次分类 特征 评价标准 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象