检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:曹卫权 褚衍杰 李显 CAO Weiquan;CHU Yanjie;LI Xian(National Key Laboratory of Science and Technology on . Blind Signal Processing,610041, China)
出 处:《西安交通大学学报》2017年第10期142-148,共7页Journal of Xi'an Jiaotong University
基 金:国家自然科学基金资助项目(U1536105)
摘 要:针对机器学习中含残缺项的数据不能被有效利用,导致分类和回归准确率不高的问题,提出了一种近似补全方法——k-ANNO方法。给定残缺的数据样本,该方法首先通过离线构建的图结构来近似搜索与该样本最接近的k个近邻顶点,然后采用快速二次规划估计各近邻的最优权重,最后基于权重值来补全样本中的残缺项,用户可以根据实际需求在补全效率与准确性之间折中。k-ANNO方法较好地解决了机器学习中普遍存在的数据残缺问题,有效抑制了数据残缺对分类和回归精度的干扰。利用多份公开数据集评估了k-ANNO方法的补全效果,结果表明:当加速比在2~10之间时,k-ANNO方法的分类错误率比已有的均值补全、C均值补全、自组织映射补全方法低1%~4%,回归均方根误差比已有方法低约0.5~2.0;当样本规模为4 000时,在不同加速比参数下,k-ANNO方法的计算效率比朴素k近邻方法高约35%~320%。An approximate imputation method called k-ANNO is proposed to handle the problems of missing data in machine learning field given a missing sample.The proposed method begins by constructing an offline graph to approximately search nearest neighbors of the partially missing sample efficiently.Then a fast quadratic programming algorithm is utilized to determine the optimal weight for each neighbor.Finally,unmissed parts of the neighbors are used to impute the missing attributes by the estimated weights.Users get the freedom to weigh up between efficiency and imputation accuracy.The widespread data missing problems are well solved in this paper and k-ANNO is able to depress the impact of missing data effectively.Experiments on various well known datasets show that when the speedup rate parameters are between 2 and 10,k-ANNO method outperforms existing ones such as mean imputation or C-Means imputation etc.and the classification error and the regression error are 1% to 4% and 0.5-2.0 lower than those,respectively.Meanwhile,k-ANNO outperforms nave k-NN imputation with a faster efficiency increased by 35%-320% faster.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.138.120.156