邻域信息修正的不完整数据多填充集成分类方法被引量：2

Multiple Imputation-Revision Ensemble Classification with Neighborhood Information

作　　者：朱先远[1] 严远亭[2] 张燕平[2] ZHU Xianyuan;YAN Yuanting;ZHANG Yanping(School of Information and Artificial Intelligence,Anhui Business College of Vocational Technology,Wuhu,Anhui 241002,China;School of Computer Science and Technology,Anhui University,Hefei 230601,China)

机构地区：[1]安徽商贸职业技术学院信息与人工智能学院,安徽芜湖241002 [2]安徽大学计算机科学与技术学院,合肥230601

出　　处：《计算机工程与应用》2023年第23期125-135,共11页Computer Engineering and Applications

基　　金：国家自然科学基金(61872002,62272001);安徽高校自然科学研究重点项目(KJ2021A1483,2022AH052740,2023AH052296)。

摘　　要：不完整数据集分类前需要对缺失值先填充。目前已有了一些经典的缺失值填充算法,如均值填充、K近邻填充等。它们各有优势,但这些算法对缺失值的估算易受到与缺失值相关性不大的其他数据干扰,影响缺失值填充效果,进而影响后续分类性能。针对该问题,提出一种邻域信息修正不完整数据多填充集成分类方法。该方法通过嵌入修正填充模块来优化填充过程,利用纯度和邻域半径筛选出待修正填充的近邻数据样本,并根据这些近邻数据样本对缺失值进行修正填充,进一步提升填充精度。同时,融合了多种经典填充算法优势,利用多填充的数据多样性,通过引入集成学习提升分类精确度。实验结果表明,该方法对基准数据集上的缺失值填充效果、数据分类精确度都优于对比方法,同时在真实不完整数据集上也表现出更好的分类精确度。Missing value imputation is one of the important preprocess techniques for incomplete data classification.Numerous missing value imputation methods have been proposed over the past decades.However,these algorithms are prone to being affected by other data that is not related to the missing values,leading to imprecise imputation results and degradation of subsequent classification performance.To address this issue,this paper proposes an incomplete data classification method based on multiple imputation-revision ensemble with local information.The method incorporates an imputation-revision module that selects neighbor of the sample to be corrected and imputed based on neighborhood purity and neighborhood radius,resulting in better imputation accuracy.The method also integrates the strengths of multiple classic imputation algorithms and utilizes the diversity of multiple imputed dataset to enhance classification accuracy via ensemble learning.Experimental results demonstrate that this method outperforms compared methods in terms of imputation accuracy and classification performance on benchmark datasets,and it also exhibits superior classification accuracy on real-world incomplete datasets.

关键词：不完整数据分类修正填充邻域信息集成学习

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

邻域信息修正的不完整数据多填充集成分类方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

邻域信息修正的不完整数据多填充集成分类方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

邻域信息修正的不完整数据多填充集成分类方法被引量：2