检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐红 矫桂娥[2,3] 张文俊 XU Hong;JIAO Guie;ZHANG Wenjun(School of Information,Shanghai Ocean University,Shanghai 201306,China;School of Information,Shanghai Jiangiao University,Shanghai 201306,China;Shanghai Film Academy,Shanghai University,Shanghai 200072,China)
机构地区:[1]上海海洋大学信息学院,上海市201306 [2]上海建桥学院信息学院,上海市201306 [3]上海大学上海电影学院,上海市200072
出 处:《应用科学学报》2023年第4期657-668,共12页Journal of Applied Sciences
基 金:校级重点科研项目(No.sjq17007);江苏省研究生科研与实践创新基金(No.SJCX20_1352)资助。
摘 要:为了提升分类模型对非平衡数据的分类性能,提出一种EMWRS(expectation-maximization weighted resampling)抽样算法和WCELoss(weighted cross entropy loss function)损失函数,在数据预处理阶段采用高斯混合模型得知数据分布特点,根据其聚类结果分析每个聚类簇中样本权重,以及样本分布和对应权重对数据进行采样,降低数据集不平衡程度;再依据样本比例权重对少数类和多数类赋予不同的代价损失,构建卷积神经网络模型,提高非平衡数据集的分类准确性。构建的卷积神经网络以F1和G-mean为评价指标,在UCI(university of California irvine)公共数据集adult上与SMOTE(synthetic minority over-sampling technique)和ADASYN(adaptive synthetic sampling)等多种经典算法进行比较,结果显示在这两种评价指标中所提模型均为第一,这表明改进后的卷积神经网络模型能够很好地提高少数类分类正确率。Imbalanced data classification is a challenging task in big data mining.The distribution of imbalanced data seriously affects the classification performance of models,especially for minority classes.In this paper,an expectation-maximum weighted resampling(EMWRS)algorithm and weighted cross entropy Loss(WCELoss)function are proposed to improve the classification performance of imbalanced data.The proposed approach uti-lizes a Gaussian mixture model to preprocess the data and employs weighted sampling and cost-sensitive learning to construct a convolutional neural network model.The con-structed convolutional neural network is evaluated using F1 and G-mean as indicators,and compared with various classic algorithms such as SMOTE(synthetic minor over sam-pling technique)and ADASYN(adaptive synthetic sampling)on the adult datasets of UCI(university of California irvine).The experimental results demonstrate that the proposed model outperforms ADASYN and other classical algorithms in terms of F1 and G-mean on UCI adult datasets,which indicates that the proposed model effectively enhances the accuracy of minority classification.
关 键 词:非平衡数据 高斯混合模型 样本加权 代价损失 卷积神经网络
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49