基于PCA的近邻均值填补优化算法  被引量:1

Predicting Missing Data with Neighborhood Mean Based on PCA

在线阅读下载全文

作  者:谢霖铨[1] 毕永朋 廖龙龙 XIE Lin-quan;BI Yong-peng;LIAO Long-long(Faculty of Science,Jiangxi University of Science and Technology,Ganzhou 341000,China)

机构地区:[1]江西理工大学理学院,江西赣州341000

出  处:《软件导刊》2018年第6期67-69,76,共4页Software Guide

基  金:国家自然科学基金项目(61762047);江西省科技厅青年科学基金项目(20161BAB211015)

摘  要:均值填补是常用的数据填补方式,但往往忽略了相邻变量之间的相互关系,又对噪声数据极为敏感。将主成份分析算法应用到均值填补算法中,提取相邻各属性的特征重要度,并采用属性重要度作为权重,以均值填补的计算方式算出缺失数据相邻矩阵的加权平均值,将其作为相邻属性对于均值填补的影响偏移值,加入到均值填补的均值计算中。通过对UCI数据集的仿真实验可知,基于PCA改进的算法填补的准确性明显优于均值填补算法。Mean filling algorithm is a commonly-adopted way to fill missing data.However the correlation between these variables is ignored and also extremely sensitive to noise data.In this paper,the principal component analysis(PCA)algorithm is applied to mean filling algorithm,and the characteristics of adjacent properties are proposed.The weighted mean value of the adjoining matrix of the missing data is calculated by using the attribute importance as the weight.As an adjacent property,the offset value of the mean value is added to the mean calculation of the mean filling.According to results of the UCI dataset simulation experiment,the accuracyof the improved complement algorithm based on PCA is clearly higher than that of the mean filling algorithm.

关 键 词:近邻均值填补 主成分分析 特征重要度 偏移值 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象