检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙中彬 刁宇轩 马苏洋 SUN Zhong-bin;DIAO Yu-xuan;MA Su-yang(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;Mine Digitization Engineering Research Center of the Ministry of Education,Xuzhou,Jiangsu 221116,China)
机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]矿山数字化教育部工程研究中心,江苏徐州221116
出 处:《电子学报》2024年第10期3392-3408,共17页Acta Electronica Sinica
基 金:中央高校基本科研业务费专项资金资助(No.2021QN1075)。
摘 要:多标签分类任务广泛存在于现实生活中,然而其经常存在不均衡数据问题,严重影响了分类性能.目前解决该问题的主流技术为重采样方法,主要分为过采样和欠采样,过采样通过生成与少数类标签相关的样本,欠采样则是通过删除与多数类标签相关的样本.然而,这些方法都专注于解决一种不均衡问题,即标签内不均衡或标签间不均衡,导致在解决一种不均衡的同时可能引入另一种不均衡.针对该问题,本文提出一种基于安全欠采样的不均衡多标签数据集成学习方法ESUS(Ensemble learning method based on Safe Under-Sampling).首先通过标签划分将多标签不均衡数据集划分成单标签数据集和标签对数据集,针对单标签数据集,提出一种安全欠采样方法解决标签内不均衡问题,并利用采样后的均衡数据集构建二分类模型.对于标签对数据集,进行数据剪枝后利用集成学习解决标签间不均衡问题,在保持分类性能的同时降低时空复杂度.最后将单标签数据集模型和标签对数据集模型集成为最终的分类模型.在六个多标签不均衡数据集上的实验结果表明:和七种对比方法相比,ESUS方法在四个评价指标上更稳定有效.The task of multi-label classification is widely present in real life,but there is often an issue of imbalanced data,which seriously affects the classification performance.At present,the mainstream technology for solving this problem is resampling,which are mainly divided into over-sampling and under-sampling.Particularly,over-sampling generates samples related to minority class labels while under-sampling removes samples related to majority class labels.However,these methods all focus on solving an imbalance problem,namely intra label imbalance or inter label imbalance,which may introduce another imbalance problem while solving one imbalance problem.In response to this issue,this paper proposes an imbalanced multi-label data ensemble learning method ESUS(Ensemble learning method based on Safe Under-Sampling)based on safe under-sampling.Firstly,the imbalanced multi-label dataset is divided into single label datasets and label pair datasets through label partitioning.For single label datasets,this paper proposes a secure under-sampling method to solve the problem of intra label imbalance,and constructs binary classification models using the sampled balanced dataset.For label pair datasets,ensemble learning is used on the pruned data to solve the problem of inter label imbalance,which may maintain the classification performance of the model and reduce spatiotemporal complexity.Finally,the single label dataset models and label pair dataset models are integrated into the final classification model.The experimental results on six imbalanced multi-label datasets show that compared with seven comparison methods,the ESUS method is more stable and effective on four evaluation metrics.
关 键 词:多标签分类 不均衡数据 标签划分 安全欠采样 数据剪枝 集成学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90