基于随机数据块与权重采样的不平衡分类集成算法

An Ensemble Algorithm Based on Random Patches and Weighted Sampling for Imbalanced Data Classification

作　　者：魏勋 WEI Xun(Software Engineering Academy,Jiangxi University of Science and Technology,Nanchang 330000,China)

出　　处：《软件导刊》2025年第3期43-47,共5页Software Guide

摘　　要：在大量的真实问题中,数据集往往是类别不平衡的,很可能会削弱学习算法的性能。为了处理不平衡数据集,业界提出了各种类别不平衡学习算法,其中包括不少集成算法。然而,这些集成算法主要考虑在样本层面进行集成而忽视了特征层面,且常规的随机采样算法未能重点关注边界区域,此区域通常是分类困难样本。鉴于此,提出一种名为BRPE的集成采样算法进行优化。首先,对特征集进行采样;其次,以多数类样本距离少数类样本的最近距离作为权重对多数类样本进行下采样,得到一个平衡的随机数据块并将其作为训练子集;再次,在训练子集上训练一个基学习器;最后,将所有基学习器的输出组合成预测结果。在10个合成数据集和8个真实数据集上均进行了详细实验。结果表明,相比其他4种不平衡集成分类算法,BRPE能够取得更高的F1和AUC值。In many real-world problems,the datasets are typically imbalanced which probably degenerate the learning algorithm.To handle these skewed datasets,there are many class imbalance learning methods are proposed,especially ensemble methods due to their efficiency.While most of these ensemble methods mainly focus on the level of samples and neglect the features aspect.And conventional random sampling method do not pay enough attention to the boundary which always contain hard classified samples.Propose an ensemble sampling method named BRPE to overcome this deficiency.BRPE firstly samples a feature subset;then down-sample majority class instances via its closest euclidean distance to minority class samples to create a balanced random patch as training subset;then trains a base learner using each of subsets,and finally obtains the output combined of these learners.Experiments on both 10 synthetic datasets and 8 real-world datasets show that BRPE can achieve higher F1 and AUC values than other four existing ensemble methods for class imbalance.

关键词：不平衡数据类别不平衡学习集成算法权重采样随机数据块

分类号：TP181[自动化与计算机技术—控制理论与控制工程] TP311.13[自动化与计算机技术—控制科学与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于随机数据块与权重采样的不平衡分类集成算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于随机数据块与权重采样的不平衡分类集成算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索