检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邢延[1] 陈嘉锋 贾小彦[1] 汪新[2] Xing Yan;Chen Jiafeng;Jia Xiaoyan;Wang Xin(School of Automation,Guangdong University of Technology,Guangzhou,510006,China;;School of Civil and Transportation Engineering,Guangdong University of Technology,Guangzhou,510006,China)
机构地区:[1]广东工业大学自动化学院,广州510006 [2]广东工业大学土木与交通工程学院,广州510006
出 处:《数据采集与处理》2018年第5期936-944,共9页Journal of Data Acquisition and Processing
基 金:国家自然科学基金(51378128)资助项目;广东省自然科学基金(2015A030313498)资助项目
摘 要:类别混叠度是指不同类别数据之间互相交叠、混合的程度,其量化指标包含基于几何统计的和基于信息论的两类,用于衡量数据分类的难易。实际分类任务中存在大量的非均衡数据,大类与小类样本之间悬殊的数量差别给分类造成了极大的困难。本文采用实验研究的方法,验证类别混叠度量化指标指导非均衡数据分类的有效性,以减少甚至避免盲目试错带来的庞大计算开销。首先,针对两类分类问题,设计验证实验,在不同类数据非均衡率,不同别边界形状、不同特征类型、不同概率分布的非均衡仿真数据上研究类别混叠度的有效性。其次,在实验研究的基础上,分析数据的非均衡性对类别混叠度的影响规律,找出类别混叠度指导非均衡分类的有效方法。最后,在真实的非均衡数据上验证类别混叠度指导非均衡分类的实际效果。实验结果表明,对数据的非均衡率具有较强鲁棒性的类别混叠度量化指标可以有效地指导非均衡数据的分类器选择。Class overlap is defined as the overlay degree of data from different classes,quantified by the approaches of geometrical statistics and information theory,and it is used to measure the complexity of a classification.There are imbalanced data in the real world,and the great disparity of the sample amounts challenges classification.With the help of experiments,we evaluate the efficiency of the class overlap measures on imbalanced data classification.Firstly,focusing on two-class classification,the experiments are designed to evaluate the efficiency of the class overlap measures on synthetic unbalanced data,which are generated with various skewness,class boundary shapes,feature types and probability distributions.Secondly,according to the experimental results on the artificial data,the influence rules of the imbalanced ratio on the measures are analyzed,then the ways of the measures to guide unbalanced data classification are concluded.Finally,the conclusions are evaluated on the real-world imbalanced data sets.The experimental results demonstrate that those measures with higher robustness on data skeness can efficiently guide classifiers selection for imbalanced data classification.
关 键 词:类别混叠度 分类复杂度 非均衡数据 分类 非均衡率
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30