检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张安珍 司佳宇 梁天宇 朱睿 邱涛 ZHANG An-Zhen;SI Jia-Yu;LIANG Tian-Yu;ZHU Rui;QIU Tao(School of Computer Science,Shenyang Aerospace University,Shenyang 110136,China)
机构地区:[1]沈阳航空航天大学计算机学院,辽宁沈阳110136
出 处:《软件学报》2024年第9期4448-4468,共21页Journal of Software
基 金:国家自然科学基金青年基金(62102271,62002245);辽宁省教育厅基础研究项目(JYT2020027)。
摘 要:不一致数据子集修复问题是数据清洗领域的重要研究问题,现有方法大多是基于完整性约束规则的,采用最小删除元组数量原则进行子集修复.然而,这种方法没有考虑删除元组的质量,导致修复准确性较低.为此,提出规则与概率相结合的子集修复方法,建模不一致元组概率使得正确元组的平均概率大于错误元组的平均概率,求解删除元组概率和最小的子集修复方案.此外,为了减小不一致元组概率计算的时间开销,提出一种高效的错误检测方法,减小不一致元组规模.真实数据和合成数据上的实验结果验证所提方法的准确性优于现有最好方法.Subset repair for inconsistent data is an important research problem in the field of data cleaning.Most of the existing methods are based on integrity constraint rules and adopt the principle of the minimum number of deleted tuples for subset repair.However,these methods take no account of the quality of deleted tuples,and the repair accuracy is low.Therefore,this study proposes a subset repair method combining rules and probabilities.The probability of inconsistent tuples is modeled so that the average probability of correct tuples is greater than that of wrong tuples,and the optimal subset repair with the smallest sum of the probability of deleted tuples is calculated.In addition,in order to reduce the time overhead of calculating the probability of inconsistent tuples,this study proposes an efficient error detection method to reduce the size of inconsistent tuples.Experimental results on real data and synthetic data verify that the proposed method outperforms the state-of-the-art subset repair method in terms of accuracy.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.135.50