检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:肖枝洪 李季 王一超 XIAO Zhihong;LI Ji;WANG Yichao(School of Science,Chongqing University of Technology,Chongqing 400054,China;Chifeng Zhaowuda High School,Chifeng 024099,China)
机构地区:[1]重庆理工大学理学院,重庆400054 [2]内蒙古赤峰市昭乌达中学,内蒙古赤峰024099
出 处:《重庆理工大学学报(自然科学)》2022年第7期281-292,共12页Journal of Chongqing University of Technology:Natural Science
基 金:国家社科基金重点项目(17AJY028);重庆理工大学研究生创新项目(clgycx20203142)。
摘 要:针对随机森林算法(RF)以及过采样技术处理高维非均衡数据的不足,提出了新的算法:首先结合RF模型基尼系数与袋外数据准确率提出MAG算法,并用此算法对高维数据进行降维处理;其次用动态离差平方和机器学习方法改进中心SMOTE算法来优化非均衡数据少样本结构,使数据结构成为低维均衡结构;最后运用最小二乘支持向量机(LSSVM)与RF对整合数据进行分类来判定所提出的算法的有效性。RF分类器和LSSVM分类器的实验结果表明:所提出的MAG-PDSSD-SMOTE算法整合数据较已有方法在F-value值、G-mean值和Accuracy值上都有显著的提高,所提出算法整合数据较已有方法更精准,但从时间复杂度来看,提出的MAG-PDSSD-SMOTE算法比已有方法复杂一点,但还是处于同一个数量级别。In the paper,considering the insufficiency in dealing with the high-dimensional and disequilibrium data by the random forest(RF)algorithm and oversampling technology,a new algorithm is proposed:firstly,based on the combination of the Gini coefficient and OOB accuracy of random forest,we introduce the MAG algorithm,and use it to reduce the dimensionality of the high-dimensional data;secondly,we intend to optimize the sample structure of the negative class by improving the center SMOTE algorithm with the piecewise dynamic deviation square sum(PDSSD),so that the structure of the sample data becomes low-dimensional and equilibrium one;finally,in order to testify the validity of the new algorithm,we classify the integrated sample data by using the least square support vector machine(LSSVM)and the random forest(RF).The experimental results of RF classifier and LSSVM classifier show that the F-value,the G-value and the Accuracy are significantly elevated by the new algorithm(MAG-PDSSD-SMOTE)and verify that the new algorithm proposed by us is superior to the other algorithms appeared in the existing literatures,However,from the perspective of time complexity,the MAG-PDSSD-SMOTE algorithm proposed in this paper is slightly more complicated than the existing methods,but still at the same level.
关 键 词:MAG-PDSSD-SMOTE算法 随机森林 高维非均衡数据 数据处理
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15