检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:游凤 李代伟[1] 张海清[1] 汪杰 彭莉 王震 YOU Feng;LI Daiwei;ZHANG Haiqing;WANG Jie;PENG Li;WANG Zhen(College of Software Engineering,Chengdu University of Information Technology,Chengdu 610225,China)
机构地区:[1]成都信息工程大学软件工程学院,四川成都610225
出 处:《成都信息工程大学学报》2021年第1期32-40,共9页Journal of Chengdu University of Information Technology
基 金:国家自然科学基金资助项目(61602064);四川省科技厅资助项目(2018JY0273、2019YFG0398);欧盟资助项目(598649-EPP-l-2018-1-FR-EPPKA2-CBHE-JP)。
摘 要:随机森林填补算法在对不完备信息系统填补时具有可靠的填补性能,同时由于填补时需要多次进行随机森林建模导致算法计算量大。为了缩短算法的运行时间,提出了NKNNI-RFI(normalization k nearest neighbor imputation-random forest imputation)缺失数据填补算法。通过改变R F I算法中预填补,即使用填补更为准确的归一化KNNl(normalization k nearest neighbor imputation,NKNNI)作为预填补,为RFI算法中使用随机森林模型预测填补值提供了更接近于原始数据集的数据,使RFI算法能够在更短的时间内完成填补任务且保持良好的填补效果。实验中使用10个UCI标准数据集,将提出的算法与RFI、NKNNI、SVMI和R0USTIDA算法进行比较并使用NRMSE、PFC和A R T填补评价方法对算法效果进行评价。实验结果表明:提出算法的NRMSE和PFC与RFI算法相同,NRMSE比NKNN1、SVM1和R0USTIDA算法约低0.02~0.8,PFC比NKNNI、SVMI和R0USTIDA算法约低0.01~0.6,ART相比RFI算法最大减少程度达53%。The random forest imputation algorithm has relable imputation performance when imputes incomplete informa-tion systems.At the same time,it needs to carry out random forest modeling for many times,which results in heavy computation.In order to shorten the running time of the algorithm,the NKNNI-RFI(normalization k nearest neighbor imputation-random forest imputation)algorithm is proposed.By changing the pre-imputation in RFI,normalized KNNI(NKNNI)with more accurate is used as the initial imputation,which provides data closer to the original data set for the prediction of the imputation value using the random forest model in RFI,enabling RFI to complete the imputation task in a shorter time and maintain a good effect.In the experiment,10 UCI standard data sets were used to compare the pro-posed algorithm with algorithms including RFI,NKNNI,SVMI and ROUSTIDA,and the effectiveness of the algorithm was evaluated using NRMSE,PFC and ART evaluation methods for imputation methods.The experimental results show that the NRMSE and PFC of this algorithm are the same as RFI.NRMSE is 0.02-0.8 lower than NKNNI,SVMI and ROUSTIDA,and PFC is 0.01-0.6 lower than NKNNI,SVMI anH ROUSTIDA.ART has a maximum reduetion of 53%compared to RFI.
关 键 词:不完备信息系统 缺失数据填补 NKNNI 随机森林填补 填补评价方法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222