检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李欣 俞卫琴 LI Xin;YU Wei-qin(College of Mathematics and Statistics,Shanghai University of Engineering Science,Shanghai 201620,China)
机构地区:[1]上海工程技术大学数理与统计学院,上海201620
出 处:《计算机工程与设计》2021年第8期2218-2223,共6页Computer Engineering and Design
基 金:国家自然科学基金项目(11602134、11772148);全国统计科学研究项目一般基金项目(2018LY16)。
摘 要:为解决不平衡数据在传统处理方法中容易出现数据的过拟合和欠拟合问题,提出基于统计信息聚类边界的不平衡数据分类方法。去除数据中噪声点,根据数据对象的k距离设定邻域半径,利用对象邻域范围内的k距离统计信息寻找边界点与非边界点;将少数类中的边界点作为样本,采用SMOTE算法进行过采样,对多数类采用基于距离的欠采样删除远离边界的点,得到平衡数集。通过实验结果对比,验证了该算法的G-mean值与F-value值都有提高。To solve the problems of overfitting and underfitting of data that are prone to occur in traditional processing methods for unbalanced data,an unbalanced data classification method based on statistical information clustering boundary was proposed.The noise points were removed in the data,and the neighborhood radius was set according to the k distance of the data object,and the k distance statistical information in the neighborhood of the object was used to find boundary points and non-boundary points.The boundary points in the minority class were used as samples for oversampling,and the distance-based undersampling was used to delete the points far away from the boundary for the majority class to obtain a balanced number set.The comparison of experimental results verifies that the G-mean and F-value of the algorithm have improved.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7