检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于祥祥 钟勇[1,2] 李振东 韩啸[1,2] YU Xiangxiang;ZHONG Yong;LI Zhendong;HAN Xiao(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学,北京100049
出 处:《计算机应用》2019年第S02期164-168,共5页journal of Computer Applications
基 金:四川省科技支撑计划项目(2014GZ0013)
摘 要:针对大数据环境下的数据不一致性问题,提出了基于MapReduce的不一致数据检测与修复算法。在传统函数依赖上引入语义约束的条件函数依赖(CFD),首先按照表达形式的不同把条件函数依赖分为常量条件函数依赖和变量条件函数依赖;然后对条件函数依赖集的一致性问题进行检测,确保条件函数依赖集之间不会产生冲突;接下来采用修改等价类的目标值解决条件函数依赖的违反;最后结合MapReduce不同阶段的运行特点,在map端和reduce端分别对违反常量条件函数依赖和变量条件函数依赖数据进行修复。实验结果表明在错误率相同的情况下,基于条件函数依赖的算法比传统算法的准确率更高、扩展性更好。Focusing on the problem of data inconsistency in big data environment,an inconsistency detection and repair algorithm based on MapReduce was proposed and implemented.Firstly,the Conditional Function Dependencies(CFDs)that introduced semantic constraints on traditional conditional function were divided into constant conditional function dependencies and variable conditional function dependencies according to different expression forms.Then,the consistency problem of the conditional function dependency set was detected to ensure that there is no conflict between conditional function dependency sets,and the target value of the equivalence class was modified to solve the violation of the conditional function dependency.Finally,combined with the running characteristics of different stages of MapReduce,the data of the violation of the constant conditional function dependencies and the variable conditional function dependencies were repaired on the map side and the reduce side respectively.The experimental results show that under the same error rate,the algorithm based on conditional function dependence has higher accuracy and better scalability than the traditional algorithm.
关 键 词:大数据 数据质量 不一致 条件函数依赖 MAPREDUCE
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.202.126