基于MapReduce框架的海量数据相似性连接研究进展被引量：16

Similarity Joins on Massive Data Based on MapReduce Framework

机构地区：[1]东北大学信息科学与工程学院,沈阳110819 [2]国防科学技术大学信息系统与管理学院,长沙410073

出　　处：《计算机科学》2015年第1期1-5,27,共6页Computer Science

基　　金：国家重点基础研究发展计划(973)项目(2012CB316201);国家自然科学基金(61272179;61173028);教育部博士点基金(20120042110028);教育部-英特尔信息技术专项科研基金(MOE-INTEL-2012-06)资助

摘　　要：海量数据相似性连接作为海量数据处理的基本操作,在文本聚类、剽窃检测、实体解析等研究领域具有重要作用。另一方面,MapReduce编程模型因为具有良好的可扩放性、容错性和易用性,被广泛地应用于海量数据处理。因此,基于MapReduce框架的海量数据相似性连接查询技术成为海量数据处理领域的热点问题之一。首先,概括了海量数据固有特点和MapReduce编程框架的缺陷给现有相似性连接查询技术带来的巨大挑战;其次,提出了海量数据相似性连接的定义,按3种不同的分类标准对其进行了分类;接着,重点分析了集合、字符串和向量数据类型的海量相似性连接查询最新技术,并从效率和适用范围等方面分别对这些技术进行了比较;最后,讨论了海量数据相似性连接查询技术亟待解决的关键问题,并提出了一些有前景的解决方案。As a basic operation of large-scale data processing,similarity joins on large-scale data play an important role in document clustering,plagiarism detection,entity resolution and many other fields.On the other hand,the MapReduce programming model is widely applied to massive data processing because of its good scalability,fault tolerance and usability.Therefore,similarity joins on massive data based on MapReduce become one of the hot topics in the field of massive data processing.Firstly,big challenges of similarity join query introduced by rapid growth of data volume were generalized.Then,the definition of similarity joins on massive data was presented and similarity joins on massive data were classified according to three different standards.In addition,the current status of set,string and vector similarity joins on massive data were emphatically analyzed and compared from the aspects of efficiency,applicability and so on.Finally,the research focus and trend of this area were investigated and the promising solutions were suggested.

关键词：海量数据相似性连接 MAPREDUCE TOP-K

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于MapReduce框架的海量数据相似性连接研究进展被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于MapReduce框架的海量数据相似性连接研究进展 被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于MapReduce框架的海量数据相似性连接研究进展被引量：16