检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黎旭 陈家兑[1] 吴永明[1,2,3] 宗文泽 LI Xu;CHEN Jiadui;WU Yongming;ZONG Wenze(Key Laboratory of Advanced Manufacturing Technology of Ministry of Education,Guizhou University,Guiyang 550025,China;College of Mechanical Engineering,Guizhou University,Guiyang 550025,China;State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China)
机构地区:[1]贵州大学现代制造技术教育部重点实验室,贵阳550025 [2]贵州大学机械工程学院,贵阳550025 [3]贵州大学公共大数据国家重点实验室,贵阳550025
出 处:《计算机工程与应用》2022年第16期284-291,共8页Computer Engineering and Applications
基 金:贵州省科技支撑计划项目((2017)2029,[2021]一般439);贵州省科技计划项目(黔科合平台—JXCX[2021]001)。
摘 要:不平衡数据分析是智能制造的关键技术之一,其分类问题已成为机器学习和数据挖掘的研究热点。针对目前不平衡数据过采样策略中人工合成数据边缘化且需要降噪处理的问题,提出一种基于改进SMOTE(synthetic minority oversampling technique)和局部离群因子(local outlier factor,LOF)的过采样算法。首先对整个数据集进行K-means聚类,筛选出高可靠性样本进行改进SMOTE算法过采样,然后采用LOF算法删除误差大的人工合成样本。在4个UCI不平衡数据集上的实验结果表明,该方法对不平衡数据中少数类的分类能力更强,有效地克服了数据边缘化问题,将算法应用于磷酸生产中的不平衡数据,实现了该不平衡数据的准确分类。Imbalanced data analysis is one of the key technologies of intelligent manufacturing,and its classification prob-lem has become a research hotspot in machine learning and data mining.Aiming at the problem of artificial synthetic data marginalization and noise reduction in the current imbalanced data oversampling strategy,this paper proposes an over-sampling algorithm based on improved SMOTE(synthetic minority oversampling technique)and LOF(local outlier factor).Firstly,perform K-means clustering on the entire data set,select high-reliability samples for oversampling with the improved SMOTE algorithm,and finally use LOF algorithm to delete artificially synthesized samples with large errors.The experi-mental results on 4 UCI imbalanced data sets show that the method is effective.The classification ability of minority class in imbalanced data is stronger,which effectively overcomes the problem of data marginalization.The algorithm is applied to imbalanced data in phosphoric acid production,and the accurate classification of imbalanced data in phosphoric acid production is realized.
关 键 词:不平衡数据 过采样 局部离群因子 聚类 合成少数过采样技术(SMOTE)
分 类 号:TP399[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30