数据质量检测规则挖掘方法  被引量:8

Mining Method for Data Quality Detection Rules

在线阅读下载全文

作  者:刘波[1] 耿寅融[1] 

机构地区:[1]暨南大学信息科学技术学院计算机科学系,广州510632

出  处:《模式识别与人工智能》2012年第5期835-844,共10页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.61003056);广东省自然科学基金项目(No.S2012010008831);广东省科技攻关项目(No.2010B010600026)资助

摘  要:数据质量规则是检测数据库质量的关键.为从关系数据库中自动发现数据质量规则,并以其为依据检测错误数据,研究质量规则表示形式及其评估度量,提出以数据项分组及其可信度为依据的最小质量规则计算准则、挖掘算法以及采用质量规则检测错误数据的思路.该数据质量规则形式借鉴关联规则的可信度评估机制、条件函数依赖的表达能力,统一描述函数依赖、条件函数依赖、关联规则等,具有简洁、客观、全面、检测异常数据准确等特性.与相关研究相比,降低挖掘算法的时间复杂度,提高检错率.用实验证明该方法的有效性和正确性.Data quality rules are key to the database quality detection. To discover data quality rules from relational databases automatically and detect the error or abnormal data based on them, the form and evaluation measures of data quality rules are studied, and criterions of computing data quality rules are presented based on data item groups and the confidence threshold. The algorithms of mining minimal data quality rules and the main idea of detecting data errors using data quality rules are also given. The new form of data quality rules makes use of confidence mechanism of association rules and the expression of conditional functional dependencies to describe functional dependencies, conditional functional dependencies and association rules in the same format. It can be concluded that this kind of data quality rules has the properties of conciseness, objectivity, completeness and accuracy of detecting the error or abnormal data. Compared with other related research work, the proposed algorithms have lower temporal complexity, and the discovered quality rules improve the detecting rate. The effectiveness and correctness of the proposed methods are proved by the experiments.

关 键 词:数据质量规则 检测 挖掘 数据项分组 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象