检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王俊陆[1] 王玲[1] 王妍[1,2] 宋宝燕[1] WANG Jun-lu WANG Ling WANG Yan SONG Bao-yan(School of Information, Liaoning University, Shenyang 110036, China School of Information Science and Engineering,Northeastern University,Shenyang 110819,China)
机构地区:[1]辽宁大学信息学院,沈阳110036 [2]东北大学信息与工程学院,沈阳110819
出 处:《计算机科学》2017年第2期98-102,106,共6页Computer Science
基 金:国家自然科学基金项目(61472169;61472072);国家科技支撑计划项目(2012BAF13B08);国家"973"重点基础研究发展计划前期研究专项(2014CB360509);辽宁省科学事业公益研究基金项目(2015003003);辽宁大学科研基金(科技类)项目(LDQN2015001)资助
摘 要:随着互联网及信息技术的发展,数据缺失、损坏等问题越来越普遍,尤其随着数据收集工作从人工转向机器,存储介质的不稳定性及网络传输出现遗漏等原因都导致数据缺失更加严重。数据库中大量的缺失值不仅严重影响了用户查询质量,还对数据挖掘与数据分析结果的正确性造成了影响,进而误导决策。目前,对缺失数据的填补还没有一种比较通用的方法,大部分策略都是针对某一类型的缺失值问题进行处理。因此,针对不同缺失类型同时出现在不完备数据中的复杂情况,提出了一种基于元组相似度的不完备数据填补方法(IATS)。采用数据挖掘的方法提取出不完备数据集中的加权关联规则,并根据此规则进行常规缺失数据的填补,而对于数据集的异常缺失问题,又引入数据推荐算法,采用推荐筛选策略进行元组相似度的计算并实现相应填补,在很大程度上提高了数据的有效利用率和用户查询结果的质量。实验表明,IATS策略在保证填补率的前提下具有更好的准确率。With the development of Internet and information technology,the data loss,damage and other problems become more and more popular.Especially with data collection from the manual to machine,storage medium is not stability,transmission omissions appear and other reasons,resulting that missing data are more serious.A large number of missing values in the database not only seriously affect the quality of the query,but also affect the accuracy of the results of data mining and data analysis.At present,there is not a general method to deal with missing data.Most of the strategies are based on the problem of the missing value of a certain type.Therefore,in view of this complex situation of that the different deletion types also appear in the incomplete data at the same time,this paper put forward missing data imputation approach based on tuple similarity(IATS).Incomplete data sets of weighted association rules are extracted by the method of data mining,and according to the rules imputate normal missing data,and for abnormal missing data,this paper introduced data recommendation algorithm,the recommended screening strategy of tuple similarity calculation and the realization of the corresponding fill,and then it greatly improves the data effective utilization rate and user query result quality.The experimental results show that the IATS strategy has better accuracy under the premise of ensuring the filling ratio.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117