检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邹春安 王嘉宝 付光辉[1] ZOU Chun-an;WANG Jia-bao;FU Guang-hui(School of Science,Kunming University of Science and Technology,Kunming 650500,China)
出 处:《软件导刊》2022年第3期34-41,共8页Software Guide
基 金:国家自然科学基金项目(11761041)。
摘 要:不平衡分类是当今机器学习中的研究热点与难点。为提高不平衡数据的分类效果,提出MetaCost与重采样结合的不平衡分类算法——RS-MetaCost。首先在MetaCost划分子集前对不平衡数据集进行重采样,即过采样少数类或欠采样多数类,以降低或消除数据不平衡程度;其次在预测概率阶段,利用m-estimation提高少数类预测概率。采用6组模拟数据集与10组实例数据集,将RS-MetaCost与经典算法进行比较实验。结果表明,在大多数数据集上,RS-MetaCost在保证整体分类精度很高的前提下,还能提高少数类的分类精度,且过采样下的RS-MetaCost优于欠采样下的RS-MetaCost。The imbalanced classification is a hot and difficult topic in machine learning nowadays.In order to improve the classification effect of imbalanced datasets,propose an imbalanced classification algorithm which combines resampling methods and MetaCost—RS-MetaCost.First,resampling the imbalanced datasets before MetaCost subsets,that is,over sampling minority classes or under sampling majority classes,to reduce or eliminate the degree of data imbalance.Secondly,in the stage of prediction probability,m-estimation is used to increase the prediction probability of minority class,which increase the prediction probability of minority class.RSMetaCost is compared with classical algorithms with 6 simulated datasets and 10 real-world datasets.The results show that RS-MetaCost can improve the classification accuracy of a few classes under the premise of ensuring the overall classification accuracy is very high on the most of imbalanced datasets.Furthermore,the over-sampled RS-MetaCost is better than the under-sampled RS-MetaCost.
关 键 词:不平衡分类 MetaCost 重采样 M-ESTIMATION
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.175