检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:白琳 俱通 王浩 雷明珠 潘晓英 BAI Lin;JU Tong;WAND Hao;LEI Mingzhu;PAN Xiaoying(School of Computer Science and Technology,Xi'an University of Posts and Telecommunications,Xi'an 710121,Shaanxi,China;Shaanxi Province Key Laboratory of Network Data Analysis and Intelligent Processing,Xi'an 710121,Shaanxi,China)
机构地区:[1]西安邮电大学计算机学院,陕西西安710121 [2]陕西省网络数据分析与智能处理重点实验室,陕西西安710121
出 处:《山东大学学报(工学版)》2024年第4期59-66,共8页Journal of Shandong University(Engineering Science)
基 金:陕西省重点研发计划资助项目(2023-YBSF-476);西安邮电大学创新基金资助项目(CXJJYL2022043)。
摘 要:为有效解决欠采样技术在处理不平衡数据时的伪平衡问题,提出并设计一种基于欠采样的提升均衡集成学习算法。采用新的均衡采样机制,通过分箱操作协调数据的预测概率,生成高质量的训练子集,以此迭代训练分类器。基于基分类器在原始数据上的假阳性率和假阴性率,在迭代过程中自适应为其分配权重,避免性能较差的分类器影响整体决策,提高集成模型的泛化能力。新的算法能够在消除伪平衡的同时增加多数类样本的识别度,从而降低边界模糊对分类模型的影响。通过18组小型数据集和2组大型数据集的对比试验表明,该算法具有处理不平衡数据分类问题的优势。In order to effectively solve the pseudo-balancing problem of the under-sampling technique in dealing with imbalanced data,a boosted equalization ensemble learning algorithm based on under-sampling was proposed.A new equalization sampling mechanism was used to train the classifier iteratively by coordinating the prediction probabilities of the data through the binning operation,so a high-quality training subset could be generated.Based on the false-positive and false-negative rates of the base classifiers on the original data,weights were assigned adaptively to them during the iterative process,so as to avoid poorly performing classifiers from influencing the overall decision and to improve the generalization ability of the ensemble model.The new algorithm was able to increase the recognition of majority class samples while eliminating pseudo-balancing,thus reducing the impact of boundary ambiguity on the classification model.Comparative experiments with 18 sets of smal datasets and 2 sets of large datasets showed that the algorithm had the advantage of dealing with imbalanced data classification problems.
关 键 词:欠采样 类不平衡 不平衡学习 集成学习 不平衡数据分类
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49