检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈子军[1,2] 张娟娜[1,2] 刘文远[1,2]
机构地区:[1]燕山大学信息科学与工程学院,河北秦皇岛066004 [2]河北省计算机虚拟技术与系统集成重点实验室,河北秦皇岛066004
出 处:《小型微型计算机系统》2015年第10期2245-2251,共7页Journal of Chinese Computer Systems
摘 要:基于范围的空间文本相似连接是一种重要的操作,在现实生活中具有广泛的应用,例如社交推荐,但是随着数据量的迅猛增长,单机模式不能有效地对大规模的数据执行该操作.基于此,本文研究在MapReduce框架下实现该操作的方法,该方法由两个阶段构成,第一阶段产生文本标签的整体序,第二阶段进行相似连接操作.提出基于M限制矩形的数据划分策略以减少数据的复制规模,既减少了每个节点的计算量,又裁减掉了部分不相似的对象对.提出基于网格的冗余避免策略,避免了相似对象对的重复计算.最后,通过实验验证了本文所提方法的有效性.Region-based spatial-textual similarity join is an important operation, and is widely used in various applications in real life, such as social recommendations. However, with the increasing volume of data, it is difficult to perform this operation on large-scale data by using a centralized machine effectively. To this end,in this paper we propose a method to perform this operation by using MapReduce framework. This method consists of two phases. The global ordering for textual signature is generated in the first stage and the similarity join is performed in the second stage. We develop a data partitioning strategy based on M-restrict-rectangle to reduce the size of data replication, and hence it not only reduces the computation on each node, but also prunes part of dissimilar object pairs. And we propose a grid-based duplication avoidance strategy to avoid repeated computation of similar object pairs. In the end, experimental resuits show that our proposed method is effective.
关 键 词:MAPREDUCE 空间文本相似连接 数据划分 M限制矩形
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.28.161