基于动态分布式聚类算法的大数据查询处理方法被引量：14

Big Data Query Processing Method Based on Dynamic Distributed Clustering Algorithm

作　　者：唐运乐韦杏琼 TANG Yun-le;WEI Xing-qiong(School of Electromechanical and Information Engineering, Guangxi Vocational &Technical College, Nanning 530226, China;School of Information Science and Engineering, Guangxi University for Nationalities, Nanning 530006, China)

机构地区：[1]广西职业技术学院机电与信息工程学院,南宁530226 [2]广西民族大学信息科学与工程学院,南宁530006

出　　处：《西南师范大学学报（自然科学版）》2021年第5期134-139,共6页Journal of Southwest China Normal University(Natural Science Edition)

基　　金：广西教育厅自然科学基金项目(2019KY1220).

摘　　要：针对现有大数据空间查询处理方法存在执行时间长和查询结果不够准确的问题,提出一种基于动态分布式聚类算法的大数据查询处理方法,该方法分为数据预处理、数据聚类和查询处理3个部分.首先将输入数据划分为多个子集,以RRD格式存储在一组机器节点中;其次采用划分和层次混合动态聚类算法,在Apache Spark平台上对数据进行分布式聚类;最后通过K近邻查询方式获得高精度和高效率查询结果.实验结果表明,本文提出的方法具有可扩展性,可为空间查询处理提供高质量的结果,比其他查询方法更具优势.Aiming at the problems of long execution time and inaccurate query results in existing big data spatial query processing methods,a big data query processing method based on dynamic distributed clustering algorithm has been proposed,which includes data pre-processing,data clustering and query processing.Firstly,the method divides the input data into multiple subsets and stores them in a group of machine nodes in RRD format.Secondly,the partition and hierarchical hybrid dynamic clustering algorithm is used to cluster the data on Apache spark platform.And lastly,the high-precision and high-efficiency query results are obtained by K-Nearest Neighbor query.The experimental results show that the proposed method is scalable,and provides high quality results for spatial query processing,which has more advantages than other query methods.

关键词：大数据动态分布式聚类查询处理 Apache Spark

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于动态分布式聚类算法的大数据查询处理方法被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于动态分布式聚类算法的大数据查询处理方法 被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于动态分布式聚类算法的大数据查询处理方法被引量：14