检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱先远[1] 严远亭[2] 张燕平[2] ZHU Xianyuan;YAN Yuanting;ZHANG Yanping(School of Information and Artificial Intelligence,Anhui Business College of Vocational Technology,Wuhu,Anhui 241002,China;School of Computer Science and Technology,Anhui University,Hefei 230601,China)
机构地区:[1]安徽商贸职业技术学院信息与人工智能学院,安徽芜湖241002 [2]安徽大学计算机科学与技术学院,合肥230601
出 处:《计算机工程与应用》2023年第23期125-135,共11页Computer Engineering and Applications
基 金:国家自然科学基金(61872002,62272001);安徽高校自然科学研究重点项目(KJ2021A1483,2022AH052740,2023AH052296)。
摘 要:不完整数据集分类前需要对缺失值先填充。目前已有了一些经典的缺失值填充算法,如均值填充、K近邻填充等。它们各有优势,但这些算法对缺失值的估算易受到与缺失值相关性不大的其他数据干扰,影响缺失值填充效果,进而影响后续分类性能。针对该问题,提出一种邻域信息修正不完整数据多填充集成分类方法。该方法通过嵌入修正填充模块来优化填充过程,利用纯度和邻域半径筛选出待修正填充的近邻数据样本,并根据这些近邻数据样本对缺失值进行修正填充,进一步提升填充精度。同时,融合了多种经典填充算法优势,利用多填充的数据多样性,通过引入集成学习提升分类精确度。实验结果表明,该方法对基准数据集上的缺失值填充效果、数据分类精确度都优于对比方法,同时在真实不完整数据集上也表现出更好的分类精确度。Missing value imputation is one of the important preprocess techniques for incomplete data classification.Numerous missing value imputation methods have been proposed over the past decades.However,these algorithms are prone to being affected by other data that is not related to the missing values,leading to imprecise imputation results and degradation of subsequent classification performance.To address this issue,this paper proposes an incomplete data classification method based on multiple imputation-revision ensemble with local information.The method incorporates an imputation-revision module that selects neighbor of the sample to be corrected and imputed based on neighborhood purity and neighborhood radius,resulting in better imputation accuracy.The method also integrates the strengths of multiple classic imputation algorithms and utilizes the diversity of multiple imputed dataset to enhance classification accuracy via ensemble learning.Experimental results demonstrate that this method outperforms compared methods in terms of imputation accuracy and classification performance on benchmark datasets,and it also exhibits superior classification accuracy on real-world incomplete datasets.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.85.73