检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张建同[1] 李君昌 王来 樊重俊[3] ZHANG Jian-tong;LI Jun-chang;WANG Lai;FAN Chong-jun(School of Economics and Management,Tongji University,Shanghai 200092,China;Ouyeel Co.,Ltd.,Shanghai 201999,China;Business School,University of Shanghai for Science and Technology,Shanghai 200093,China)
机构地区:[1]同济大学经济与管理学院,上海200092 [2]欧冶云商股份有限公司,上海201999 [3]上海理工大学管理学院,上海200093
出 处:《系统工程》2024年第3期136-148,共13页Systems Engineering
基 金:国家自然科学基金资助项目(71971156,72371188);同济大学中央高校基本科研业务专项(22120210241);中国国家留学基金管理委员会资助项目(202206260238)。
摘 要:准确识别具有类间重叠的不平衡数据类别有着重要的理论意义与应用价值。首先,基于Switching集成学习框架,结合样本类间重叠度和邻域分布信息,定义了样本类别待转换的概率,进而提出了一种针对具有类间重叠的不平衡数据分类的集成学习算法SwitchingHD。该方法在提升少数类样本可见性的同时,完全保留了少数类样本的真实信息,能有效克服已有Switching集成学习算法在具有类间重叠的不平衡数据分类中的局限性。其次,在3种评价指标下,对比了SwitchingHD与3类Switching集成算法及2类传统集成学习算法在33个具有类间重叠的不平衡数据集上的分类表现。再次,分析了6类集成学习算法分类效果对待转换样本比例和基分类器数目的敏感性,给出了最优待转换样本比例的范围及这两个因素的作用效果,分析表明SwitchingHD在AUC下的分类效果显著优于其他集成学习算法,针对具有类间重叠的不平衡数据分类问题具有有效性与优越性。最后,以某地区电信客户数据为例,进一步对比SwitchingHD与11种新颖集成学习算法识别潜在流失客户的效果。It has great theoretical significance and application value that accurately identifies imbalanced data categories with classes overlapping.Based on the Switching ensemble learning framework,this paper first defines the probability of the instance's class to be switched combined with the classes overlapping and the neighborhood distribution and then developes an ensemble algorithm for the imbalanced data classification with classes overlapping,named SwithcingHD.The algorithm not only improves the visibility of minority samples but also ultimately retains the original information of minority samples,which can effectively overcome the limitations of the existing Switching ensemble algorithm for imbalanced data classification with overlapping classes.Under three evaluation indexes,we compares the classification performance of SwithcingHD with three types of Switching-based ensemble algorithms and two types of traditional ensemble algorithms on 33 imbalanced datasets with classes overlapping.Then,the sensitivities of the classification effect of 6 ensemble algorithms to the proportion of samples to be switched and the number of baseline classifiers are analyzed,and the range of the optimal proportion of samples to be switched and the effect of two factors is derived.The analysis shows that the classification effect of SwithicngHD under AUC is significantly better than other ensemble algorithms,which is effective and superior in classifying imbalanced data with classes overlapping.Finally,taking the telecom customer data for example,the performance of SwitchingHD and 11 advanced ensemble algorithms on identifying the lost customer are further compared.
关 键 词:不平衡数据分类 类间重叠 邻域分布 Switching算法
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170