检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:粟佳 于洪[1] SU Jia;YU Hong(Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)
机构地区:[1]计算智能重庆市重点实验室(重庆邮电大学),重庆400065
出 处:《计算机应用》2024年第5期1423-1427,共5页journal of Computer Applications
基 金:国家重点研发计划项目(2021YFF0704103);国家自然科学基金资助项目(62136002,62233018)。
摘 要:应用中的各种因素可能造成数据缺失,影响后续任务的分析。因此,数据集缺失值的插补尤为重要。相比原本没有插补的处理,错误的插补值也会对分析造成更严重的偏差。针对这种情况,提出新的采用双重判别器的基于条件生成对抗插补网络(C-GAIN)的缺失值插补算法DDC-GAIN(Dual Discriminator based on C-GAIN)。该算法通过一个辅助判别器辅助主判别器判断预测值的真假,即根据一个样本的全局信息判断这个样本生成的真假,更注重特征之间的关系,以此估算预测值。在4个数据集上与5种经典插补算法进行对比实验,结果表明:同样条件下,DDC-GAIN算法在样本量较大时的均方根误差(RMSE)最低;在Default credit card数据集上缺失率为15%时,DDC-GAIN算法的RMSE比次优算法C-GAIN降低了28.99%。这说明利用辅助判别器帮助主判别器学习特征之间的关系是有效的。Various factors in the application may cause data loss and affect the analysis of subsequent tasks.Therefore,the imputation of missing data values in data sets is particularly important.Moreover,the accuracy of data imputation can significantly impact the analysis of subsequent tasks.Incorrect imputation data may introduce more severe bias in the analysis compared to missing data.A new missing value imputation algorithm named DDC-GAIN(Dual Discriminator based on Conditional Generation Adversarial Imputation Network)was introduced based on Conditional Generative Adversarial Imputation Network(C-GAIN)and dual discriminator,in which the primary discriminator was assisted by the auxiliary discriminator in assessing the validity of predicted values.In other words,the authenticity of the generated sample was judged by global sample information and the relationship between features was emphasized to estimate predicted values.Experimental results on four datasets show that,compared with five classical imputation algorithms,DDC-GAIN algorithm achieves the lowest Root Mean Square Error(RMSE)under the same conditions and with large sample size;when the missing rate is 15%on the Default credit card dataset,the RMSE of DDC-GAIN is 28.99%lower than that of the optimal comparison algorithm C-GAIN.This indicates that it is effective to utilize the auxiliary discriminator to support the primary discriminator in learning feature relationships.
关 键 词:条件生成对抗插补网络 缺失值插补 不完备性 特征关系 双重判别器
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.139.248