基于统计推理的不一致数据清洗方法  被引量:2

Cleaning inconsistent data based on statistical inference

在线阅读下载全文

作  者:张安珍 胡生吉 夏秀峰 Zhang Anzhen;Hu Shengji;Xia Xiufeng(Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China;School of Computer Science,Shenyang Aerospace University,Shenyang 110136,China)

机构地区:[1]中国科学院沈阳计算技术研究所,沈阳110168 [2]沈阳航空航天大学计算机学院,沈阳110136

出  处:《计算机应用研究》2024年第10期2987-2992,共6页Application Research of Computers

基  金:国家自然科学基金青年基金资助项目(6210071734)。

摘  要:不一致数据修复是数据清洗领域的一个重要研究方向,现有方法大多是基于完整性约束规则的,采用最小代价原则进行修复,然而,代价最小的修复方案通常是不正确的,导致现有修复方法的准确率较低。针对现有方法准确率较低的问题,提出了一种基于统计推理的不一致数据清洗方法BayesOUR,兼顾修复的代价与质量,提高修复准确性。BayesOUR主要分为三个阶段:首先根据完整性约束规则进行错误检测;然后利用贝叶斯网络推理所有可能的一致性修复方案概率;最后选择概率最大的修复方案进行数据清洗。真实数据上的实验结果表明,该方法与目前领先的方法相比,能够显著提高不一致数据修复的准确性。Inconsistent data repair is an important research direction in the field of data repair.Most of the existing methods are based on integrity constraint rules and use the principle of minimum cost for repair.However,the repair scheme with the minimum cost is usually incorrect,which leads to the low accuracy rate of the existing repair methods.To address the problem of low accuracy of existing methods,this paper proposed an inconsistent data repair method based on statistical inference BayesOUR,to balance the cost and quality of repair and improve the repair accuracy.It mainly divided BayesOUR into three phases.Firstly,it performed error detection based on the integrity constraint rule,and then utilized Bayesian network to reason about the probability of all the possible consistent repair schemes.Finally,it selected the repair scheme with the largest probability for data repair.Experimental results on real data show that the method in this paper can significantly improve the accuracy of inconsistent data repair compared with the current leading methods.

关 键 词:不一致数据 贝叶斯网络 统计推理 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象