检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]大连理工大学软件学院,辽宁大连116024 [2]大连理工大学计算机科学与技术学院,辽宁大连116024
出 处:《计算机研究与发展》2014年第12期2702-2710,共9页Journal of Computer Research and Development
基 金:国家自然科学基金项目(61225010;61432002;61173162;61300084);微软亚洲研究院与中国科学院计算机网络信息中心合作项目
摘 要:近年来,Skyline计算在决策应用中起着越来越重要的作用.针对单机处理的研究已较为成熟.现今大数据爆炸,Skyline计算面临着大数据处理的问题.MapReduce是一个并行模型,广泛应用于数据密集型应用处理中.众所周知,MapReduce处理要求任务是可分解的.Skyline计算在MapReduce上执行时,分解任务的方法有网格划分、基于角度的划分等.网格划分仅在数据维度较低时表现良好;基于角度的划分适用于低维和高维数据,但在划分前需要一个复杂并且费时的坐标转换过程.现采用一种与基于角度的划分类似的基于超平面投影的划分来分解数据集,这种划分适用于低维和高维数据,而且其在划分前的坐标转换较为简单.根据超平面投影的划分提出了一种在MapReduce上处理Skyline计算的算法MR-HPP(MapReduce with hyperplane-projections-based partition),并在该算法的过滤阶段提出了一种有效的过滤算法PSF(presorting filter).大量基于Hadoop平台的对比实验表明该算法的准确性、高效性和稳定性.Recently, Skyline computing has been playing a more and more important role in decision- making applications. Centralized processing has become relatively mature. Today with explosion of big data, Skyline computing faces the same problem of big data processing. MapReduce is a parallel model and it is widely used in data-intensive processing. As we all know, processing on MapReduce requires the task be decomposable. There are some partition methods for Skyline computing on MapReduce, such as grid partition, angle-based partition and so on. Grid partition can only get good performance on low dimensional dataset. Angle-based partition applies to both low dimensional and high dimensional dataset. But it needs a complex and time-consuming coordinates conversion process before partitioning. In this paper, we employ a method similar to angle-based partition method called hyperplane-projections-based partition to break down our dataset. It applies to both low dimensional and high dimensional dataset and at the same time the coordinates conversion process before partitioning is very simple. We propose an algorithm to process Skyline computing on MapReduce called MR-HPP (MapReduce with hyperplane-projections-based partition) based on hyperplane- projections partition. Moreover, we propose an effective filter method called PSF(presorting filter) in the filter period of MR-HPP. Extensive comparative experiments based on Hadoop have proved that our method is accurate, efficient and stable.
关 键 词:SKYLINE计算 大数据 MAPREDUCE 超平面投影划分 过滤
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.32.116