检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周传华[1,2] 任太娇 罗岚 周昊 ZHOU Chuanhua;REN Taijiao;LUO Lan;ZHOU Hao(School of Management Science and Engineering,Anhui University of Technology,Ma’anshan 243032,China;School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China)
机构地区:[1]安徽工业大学管理科学与工程学院,安徽马鞍山243032 [2]中国科学技术大学计算机科学与技术学院,安徽合肥230026
出 处:《计算机与现代化》2024年第9期95-100,113,共7页Computer and Modernization
基 金:国家自然科学基金资助项目(71772002,61702006);复杂系统多学科管理与控制安徽普通高校重点实验室资助项目(CS2020-04)。
摘 要:为了克服在数据平衡处理过程中单一重采样方法易生成冗余样本及误删重要样本信息的局限,本文提出一种基于联合熵的非平衡数据边界混合重采样算法。该算法首先通过引入边界因子对边界集和非边界集进行有效的区分,进一步构建一个联合熵指标体系以判断出边界集中少数类样本的重要程度,并根据其重要程度对细分后的少数类样本点设置不同的过采样方法和采样数量,最后使用NearMiss-2算法对非边界集中多数类样本点进行筛选并删除,从而实现数据的相对平衡。通过对9组UCI数据集进行对比实验,实验结果表明:该算法在F1-Score、G-mean及AUC这3个指标上均有提升,验证了其有效性,有较好的非平衡数据分类性能表现。In order to overcome the limitations of single resampling methods in data imbalance handling,which often lead to the generation of redundant samples and the inadvertent deletion of crucial sample information,this paper proposes a novel non balanced data boundary mixed resampling algorithm based on joint entropy.The algorithm first effectively distinguishes between the boundary set and the non-boundary set by introducing a boundary factor.It further constructs a joint entropy indicator system to assess the importance of minority class samples within the boundary set.Based on this assessment,different oversampling methods and sampling quantities are applied to the segmented minority class samples.Finally,the NearMiss-2 algorithm is used to filter and remove most of the sample points in the non-boundary set,thus achieving a relative data balance.Through comparative experiments on nine sets of UCI datasets,the experimental results show that the proposed algorithm achieves improvements in F1-Score,G-mean,and AUC metrics,which validates its effectiveness and exhibiting favorable performance in non balanced data classification.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.23.86.150