检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《山东大学学报(工学版)》2015年第5期22-28,共7页Journal of Shandong University(Engineering Science)
基 金:北京市教委基金资助项目(PXM2011_014204_09_000232)
摘 要:为了克服单机串行不确定性传播算法处理大规模数据集的局限,采用MapReduce编程模型对算法进行并行化实现。将单机算法按照算法流程进行拆分,每一步对应一个MapReduce程序。每一步的输入及输出数据都存储在Hadoop分布式文件系统上。用命中率对比并行化的不确定性传播算法与全局排名算法的性能。对比不同数据量、不同节点数时并行化的不确定性传播算法的加速比。试验结果表明,不确定性传播算法MapReduce并行化后部署在Hadoop集群上运行,命中率显著高于全局排名算法,且有着较好的并行性,扩大了单机算法所能处理的数据规模且提高了算法的运算速度。In order to overcome the limitations of the serial probabilistic spreading algorithm in dealing with large-scale dataset,a parallelization of the algorithm was put forth by using MapReduce. The complex computing tasks were decomposed into a series of MapReduce job flow for distributed parallel processing on Hadoop. The input and output data of every step were stored in the Hadoop distributed file system. Hit ratio was used to compare the parallelizable probabilistic spreading algorithm versus the global ranking method performance. Speedups of the parallelizable algorithm were compared while the amount of data and the number of nodes was different. Experiment results showed that the probabilistic spreading algorithm based on MapReduce had good parallelism and had higher hit ratio than the global ranking method. Data scale that can be handled by the serial algorithm was expanded,and the operation speed of the algorithm was raised.
关 键 词:MAPREDUCE 云计算平台 二分网络 不确定性传播算法 分布式
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.112