检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙博 周倩 陈海燕[2] SUN Bo;ZHOU Qian;CHEN Hai-yan(Department of Computer Science and Technology,Shandong Agricultural University,Taian Shandong 271018,China;College of Artificial Intelligence,Nanjing University of Aeronautics and Astronautics,Nanjing Jiangsu 211106,China)
机构地区:[1]山东农业大学计算机科学与技术系,山东泰安271018 [2]南京航空航天大学人工智能学院,江苏南京211106
出 处:《控制理论与应用》2024年第11期2139-2146,共8页Control Theory & Applications
基 金:山东省自然科学基金项目(ZR2023MF098,ZR2018QF002);山东省重大科技创新项目(2019JZZY010706)资助.
摘 要:分类是机器学习中的一项重要学习任务,基本思想是使用在训练样例集上生成的分类器对测试样例的类别进行预测.然而,很多实际应用中的训练集具有不平衡的类分布,这通常会制约学习算法的分类性能.为此,本文提出以类重叠度为优化目标的不平衡数据学习方法(COA-RBU).将相对类间势作为多数类样例效用的评价标准,并根据训练集的类重叠度自适应地确定合适欠采样比例,以降低不平衡训练集的数据复杂性.实验结果表明,类重叠度能较好地反映数据集的学习难度,并且COA-RBU具有良好的性能和较高的效率.因此,本文工作从类重叠数据复杂性角度为合适欠采样比例的确定提供了一种新的思路.Classification is an important learning task in machine learning,and it predicts the class label of a test example by employing a classifier that is learned on the training examples set.However,in many practical applications,the collected training sets have imbalanced class distribution,which usually hinders the classification performance of most classifier learning algorithms.To alleviate this problem,an imbalanced data learning approach with class overlap degree as the optimization goal(COA-RBU)is proposed in this paper.It utilizes the mutual class potential to evaluate the utility of each majority class example,and adaptively determines the proper undersampling ratio according to the class overlap degree of a training set,aiming to decrease the data complexity of the imbalanced training set.Exprimental results indicate that the class overlap degree can well reflect the learning difficulty of an imbalanced dataset,and the proposed approach COA-RBU is effective and efficient.Therefore,this work provides a novel idea for determining the proper undersampling ratio from the perspective of class overlap data complexity.
关 键 词:分类 类不平衡 欠采样 类重叠度 数据复杂性 机器学习
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200