检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李国和[1,2] 陈桂婷 郑艺峰 洪云峰 周晓明 潘雪玲 LI Guo-he;CHEN Gui-ting;ZHENG Yi-feng;HONG Yun-feng;ZHOU Xiao-ming;PAN Xue-ling(Beijing Key Lab of Petroleum Data Mining,China University of Petroleum-Beijing,Beijing 102249,China;College of Information Science and Engineering,China University of Petroleum-Beijing at Karamay,Karamay 834000,China;College of Computer Science,Minnan Normal University,Zhangzhou 363000,China;Application Research Institute,Hangzhou Shibei Intellectual Property Service Limited Company,Hangzhou 310010,China;Applied Research Institute,Xiamen Hanying Internet of Things,Xiamen 361021,China)
机构地区:[1]中国石油大学(北京)石油数据挖掘北京市重点实验室,北京102249 [2]中国石油大学(北京)克拉玛依信息科学与工程学院,新疆克拉玛依834000 [3]闽南师范大学计算机学院,福建漳州363000 [4]杭州拾贝知识产权服务有限公司应用研究院,浙江杭州310010 [5]厦门瀚影物联网应用研究院,福建厦门361021
出 处:《计算机工程与设计》2023年第9期2626-2633,共8页Computer Engineering and Design
基 金:国家自然科学基金项目(60473125,61701213);中国石油大学(北京)克拉玛依校区科研启动基金项目(RCYJ2016B-03-001);福建省自然科学基金项目(2021J011004,2021J011002)。
摘 要:为解决样本类别不均衡问题,提出基于样本分布的类别均衡化算法。采用单类支持向量机和近邻法学习多数类样本,净化类别不清的分布边界;采用密度聚簇算法对少数类样本聚簇,根据每个类簇的权重决定每个类簇生成的样本数,平衡类簇间的样本数量;根据每个簇的边界样本与非边界样本数量比值,确定每个样本权重,采用SMOTE合成少数类样本。采用UCI数据集实验对比和地震数据分析应用,验证了算法在不同分类模型均可提高分类精度。To address the problem of sample class imbalance,a class equalization algorithm based on sample distribution was proposed.A one-class support vector machine and nearest neighbor method was employed to learn the majority-class samples to purify the unclear distribution boundary.Density clustering algorithm was utilized to cluster minority-class samples,according to the weight of each class cluster,the number of samples generated by each class cluster was determined,and the distribution of inter-class was balanced.The weight of each sample was determined according to the ratio of the number of boundary samples to non-boundary samples,and SMOTE was adopted to synthesize the minority class samples.Experimental comparison in the UCI dataset and earthquake data analysis application demonstrates that the proposed algorithm can improve the classification accuracy of different classifiers,especially in imbalanced data.
关 键 词:不均衡数据 过采样 单类支持向量机 密度聚类 样本类别均衡化 样本分布 分类
分 类 号:TP306.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.133.129.9