基于分布式计算框架的不一致数据修复算法  

Inconsistency repair algorithm based on distributed computing framework

在线阅读下载全文

作  者:于祥祥 钟勇[1,2] 李振东 韩啸[1,2] YU Xiangxiang;ZHONG Yong;LI Zhendong;HAN Xiao(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学,北京100049

出  处:《计算机应用》2019年第S02期164-168,共5页journal of Computer Applications

基  金:四川省科技支撑计划项目(2014GZ0013)

摘  要:针对大数据环境下的数据不一致性问题,提出了基于MapReduce的不一致数据检测与修复算法。在传统函数依赖上引入语义约束的条件函数依赖(CFD),首先按照表达形式的不同把条件函数依赖分为常量条件函数依赖和变量条件函数依赖;然后对条件函数依赖集的一致性问题进行检测,确保条件函数依赖集之间不会产生冲突;接下来采用修改等价类的目标值解决条件函数依赖的违反;最后结合MapReduce不同阶段的运行特点,在map端和reduce端分别对违反常量条件函数依赖和变量条件函数依赖数据进行修复。实验结果表明在错误率相同的情况下,基于条件函数依赖的算法比传统算法的准确率更高、扩展性更好。Focusing on the problem of data inconsistency in big data environment,an inconsistency detection and repair algorithm based on MapReduce was proposed and implemented.Firstly,the Conditional Function Dependencies(CFDs)that introduced semantic constraints on traditional conditional function were divided into constant conditional function dependencies and variable conditional function dependencies according to different expression forms.Then,the consistency problem of the conditional function dependency set was detected to ensure that there is no conflict between conditional function dependency sets,and the target value of the equivalence class was modified to solve the violation of the conditional function dependency.Finally,combined with the running characteristics of different stages of MapReduce,the data of the violation of the constant conditional function dependencies and the variable conditional function dependencies were repaired on the map side and the reduce side respectively.The experimental results show that under the same error rate,the algorithm based on conditional function dependence has higher accuracy and better scalability than the traditional algorithm.

关 键 词:大数据 数据质量 不一致 条件函数依赖 MAPREDUCE 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象