检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国科学院成都计算机应用研究所,四川成都610041 [2]贵州大学现代制造技术教育部重点实验室,贵州贵阳550003
出 处:《浙江大学学报(工学版)》2015年第2期303-308,共6页Journal of Zhejiang University:Engineering Science
基 金:国家自然科学基金资助项目(51475097); 国家“十二五”科技支撑计划项目(2012BAF12B14); 贵州省科技资助项目(黔科合JZ字[2014]2001,黔科合计Z字[2012]4009)
摘 要:针对目前冲突数据源的质量评价模型仅考虑准确度与精确度2个方面,没有考虑数据源提供错误描述与提供空值对数据源质量会产生不同影响的情况,通过将数据源提供的错误描述定义为主动错误,并将数据源没有为实体提供描述定义为被动错误,从主动错误、被动错误2个方面建立数据源质量模型.该模型以敏感度、明确度代替了准确度与精确度;为了处理多真值问题,预先合并数据源对实体的描述,并定义了合并描述的包含关系及包含度计算模型;在包含度计算模型的基础上,提出了基于描述包含度的冲突数据源质量评价算法(TFDQ).在通用数据集Books-Authors上的实验表明,与Vote算法、TruthFinder算法相比,TFDQ算法实验结果更接近真实情况.Existing evaluating models for conflicting data sources usually take nothing but accuracy and precision into account, ignoring different impacts to the quality of data sources caused by false data values and empty values. In this paper, false descriptions provided by data sources were defined as initiative errors, while empty values were defined as passive errors. A new quality evaluating model was constructed, in which accuracy and precision were respectively substituted by sensitivity and specificity. Multiple descriptions from different sources were merged and a notion of inclusion relation as well as a calculating model for inclusion degrees was proposed as pretreatments to deal with multi-value problems. An evaluating algorithm TFDQ for conflicting data source quality based on the calculating model was put forward. Experiments on the universal data set Books-Authors show that the result from TFDQ is closer to the reality comparing to the classic Vote and TruthFinder algorithms.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.97