检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:任艳平 郑重 江一飞 严远亭[1] 张燕平[1] REN Yanping;ZHENG Zhong;JIANG Yifei;YAN Yuanting;ZHANG Yanping(College of Computer Science and Technology,Anhui University,Hefei 230601,China)
机构地区:[1]安徽大学计算机科学与技术学院,合肥230601
出 处:《计算机工程与应用》2022年第23期268-277,共10页Computer Engineering and Applications
基 金:国家自然科学基金(61806002,61872002)。
摘 要:欠采样是当前解决类不平衡问题的主流方法之一。现有研究表明,高效地处理类别重叠能够有效提升过采样方法的性能。然而,目前对欠采样的研究大多认为由于样本选择策略不当而导致的关键样本丢失是影响欠采样方法性能的主要原因,为此,研究者从不同的角度提出了一系列针对性的方法,但鲜有对欠采样中类别重叠的研究。提出一种融合贝叶斯后验概率和分布密度的欠采样方法(BPDDUS)实现重叠区域样本的检测和清洗,并通过样本的分布信息对清洗后的样本进行欠采样。具体来说,该方法通过贝叶斯后验概率对多数类样本中潜在的噪声和重叠样本进行清洗以增强分类决策边界的清晰度。对清洗后的多数类样本,引入全局分布密度和信息熵来度量样本对不平衡数据分类学习的重要程度并对其分配相应的采样权重。按样本权重欠采样并构建集成分类系统,以提升模型的泛化能力。在43个KEEL数据库数据集上进行的数值实验验证了所提的BPDDUS方法的有效性。Undersampling is one of the most popular methods for dealing with class imbalance problem.Existing research shows that efficient class overlap handling can improve the performance of imbalanced oversampling.However,most of the current undersampling researches claim that the loss of key samples due to improper sample selection strategy is the main reason affecting the performance of undersampling methods.Therefore,researchers have proposed a series of methods to select the informative majority samples,but studies on handing class overlap in undersampling are still open.In this paper,an undersampling method based on Bayes posterior probability and distribution density(BPDDUS)is proposed to detect and clean samples in overlapping areas firstly,and it undersamples the remaining samples according to the distribution information of the majority samples.Specifically,the method first cleans the potential noise and overlapping samples in the majority class by Bayes posterior probability to enhance the clarity of the classification decision boundary,the global distribution density and information entropy are introduced to measure the importance of the samples and assign the corresponding sampling weights.Finally,an ensemble classification is constructed to improve the generalization ability of the model.The validity of the proposed BPDDUS method is verified by numerical experiments on 43 KEEL databases.
关 键 词:不平衡数据 欠采样 贝叶斯后验概率 全局分布密度 集成分类 信息熵
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46