检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邱灿华[1] 吴杰[1] QIU Canhua;WU Jie(Tongji University,Shanghai 200092,China)
机构地区:[1]同济大学,上海200092
出 处:《计算机与网络》2022年第7期62-66,共5页Computer & Network
摘 要:针对传统的合成少数类过采样技术(Synthetic Minority Oversampling Technique,SMOTE)中存在的忽略类间不平衡、类内不平衡、无法控制合成样本的噪声等问题,结合DBSCAN聚类算法,提出了一种基于DBSCAN改进的SMOTE算法:使用DBSCAN算法对少数类样本进行聚类,计算少数类密度系数和采用权重为每个簇分配采样数量,将每个簇中样本点按照到簇质心的距离分为2类,对每类中的样本点分配不同的随机系数进行过采样,得到新的较为平衡的数据集。根据获取的数据集进行实验表明,改进的算法可以很好地改善分类器的分类性能。In order to solve the problems in traditional Synthetic Minority Oversampling Technique(SMOTE)oversampling algorithms,such as ignoring the imbalance between and within classes and the inability to control the noise of synthesized samples,an improved SMOTE algorithm based on DBSCAN,combined with the DBSCAN clustering algorithm,is proposed.In this approach,DBSCAN algorithm is used to cluster the minority samples.The minority density coefficient is calculated,and the number of samples is assigned to each cluster through the weights.The sample points in each cluster are divided into two categories according to the distance to the cluster centroid.Different random coefficients are assigned to the sample points in each category for oversampling,and a new and more balanced data set is obtained as a result.The experiment proves that the improved algorithm makes the classification performance of the classifier much more better.
关 键 词:SMOTE算法 DBSCAN算法 不平衡数据集 过采样
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.69