云环境下基于超球面投影分区的Skyline计算  被引量:5

Distributed Skyline Processing Based on Hypersphere Projection Partitioning on Cloud Environments

在线阅读下载全文

作  者:雷婷[1] 王涛[2] 曲武[3,4,5] 韩晓光[6] 

机构地区:[1]成都工业学院通信工程系成都611730 [2]湖南城市学院信息科学与工程学院益阳413000 [3]清华大学知识工程研究室北京100083 [4]北京启明星辰信息技术股份有限公司北京100193 [5]中关村科技园区海淀园企业博士后科研工作站北京100193 [6]北京科技大学计算机与通信工程学院北京100083

出  处:《计算机科学》2013年第6期164-171,共8页Computer Science

基  金:基于大规模复杂结构知识库的知识发现机理、模型与算法研究(60875029);多关系频繁模式挖掘模型、方法与一般架构的研究(60675030);基于多关系的模糊认知图挖掘模型、算法与评价机制研究(61175048)资助

摘  要:目前,Skyline查询在集中式数据库、分布式数据库、数据流及分类属性数据集上的良好应用前景,使其成为当前数据库界研究的重点和热点之一,受到了学术界和工业界的广泛关注,它作为一种重要的数据挖掘技术广泛应用于多目标优化、城市导航系统、用户偏好查询及约束决策、智能防御系统以及地理信息系统等领域。随着人类可以采集和利用的数据信息的急剧增长,如何处理大数据的Skyline查询成为急需解决的问题。针对云计算环境,在Map-Reduce框架下设计并实现了基于超球面投影分区的分布式Skyline算法HSPD-Skyline,其主要思想是通过对高维数据点的超平面投影映射,即由空间坐标转换为超球面坐标,可以有效提高分区内数据点的平均减枝力度,降低Skyline的计算代价。同时,使用基于空间分区树的启发式策略HA-SPT,进一步提高了HSPD-Skyline算法的处理效率。通过详细的理论分析和实验验证表明,在不考虑数据分布和进一步优化算法的条件下,提出的HSPD-Skyline算法的总体性能(可扩展性、Skyline查询时间等)优于同类算法。Recently, skyline processing has been receiving considerable attention due to its potential applications in many fields, including traditional database, distributed database, data stream and even the categorical database and so on. Both the academic and the industrial have paid much attention to it. As an important data mining technique, skyline proces- sing is of great significance for multi-objective optimization, urban navigation, multi-criteria decision making and prefe- rence query, trip planning, defense and intelligence systems and geographic information systems. In addition, the amount of data collected and used by human is developing at an astonishing speed. Therefore, how to process Skyline query of massive data is an urgent problem. Aiming at cloud computing applications, this paper designed and implemented dis- tributed Skyline processing based on hypersphere projection partitioning under the Map-Reduce framework, HSPD-Sky- line. It is showed that partitioning the data according to the hyperspherieal coordinates can increase the average pruning power of points within a partition, and reduce the cost of Skyline processing. The HSPD-Skyline algorithm also uses a heuristic strategy based on space partitioning tree, HA-SPT,to further improve the processing efficiency of the HSPD- Skyline algorithm. Finally, the theoretical analysis and experiment results illustrate that the HSPD-Skyline algorithm (Distributed Skyline Processing based on Hypersphere Projection Partitioning) consistently outperforms similar approa- ches for distributed skyline computation, regardless of data distribution, and further optimization strategies.

关 键 词:分布式Skyline计算 Map-Reduce框架 分区策略 HSPD-Skyline算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象