基于条件生成对抗插补网络的双重判别器缺失值插补算法  

Missing value imputation algorithm using dual discriminator based on conditional generative adversarial imputation network

在线阅读下载全文

作  者:粟佳 于洪[1] SU Jia;YU Hong(Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)

机构地区:[1]计算智能重庆市重点实验室(重庆邮电大学),重庆400065

出  处:《计算机应用》2024年第5期1423-1427,共5页journal of Computer Applications

基  金:国家重点研发计划项目(2021YFF0704103);国家自然科学基金资助项目(62136002,62233018)。

摘  要:应用中的各种因素可能造成数据缺失,影响后续任务的分析。因此,数据集缺失值的插补尤为重要。相比原本没有插补的处理,错误的插补值也会对分析造成更严重的偏差。针对这种情况,提出新的采用双重判别器的基于条件生成对抗插补网络(C-GAIN)的缺失值插补算法DDC-GAIN(Dual Discriminator based on C-GAIN)。该算法通过一个辅助判别器辅助主判别器判断预测值的真假,即根据一个样本的全局信息判断这个样本生成的真假,更注重特征之间的关系,以此估算预测值。在4个数据集上与5种经典插补算法进行对比实验,结果表明:同样条件下,DDC-GAIN算法在样本量较大时的均方根误差(RMSE)最低;在Default credit card数据集上缺失率为15%时,DDC-GAIN算法的RMSE比次优算法C-GAIN降低了28.99%。这说明利用辅助判别器帮助主判别器学习特征之间的关系是有效的。Various factors in the application may cause data loss and affect the analysis of subsequent tasks.Therefore,the imputation of missing data values in data sets is particularly important.Moreover,the accuracy of data imputation can significantly impact the analysis of subsequent tasks.Incorrect imputation data may introduce more severe bias in the analysis compared to missing data.A new missing value imputation algorithm named DDC-GAIN(Dual Discriminator based on Conditional Generation Adversarial Imputation Network)was introduced based on Conditional Generative Adversarial Imputation Network(C-GAIN)and dual discriminator,in which the primary discriminator was assisted by the auxiliary discriminator in assessing the validity of predicted values.In other words,the authenticity of the generated sample was judged by global sample information and the relationship between features was emphasized to estimate predicted values.Experimental results on four datasets show that,compared with five classical imputation algorithms,DDC-GAIN algorithm achieves the lowest Root Mean Square Error(RMSE)under the same conditions and with large sample size;when the missing rate is 15%on the Default credit card dataset,the RMSE of DDC-GAIN is 28.99%lower than that of the optimal comparison algorithm C-GAIN.This indicates that it is effective to utilize the auxiliary discriminator to support the primary discriminator in learning feature relationships.

关 键 词:条件生成对抗插补网络 缺失值插补 不完备性 特征关系 双重判别器 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象