Hadoop平台下基于优化X-means算法的大数据聚类研究  被引量:2

Research on Big Data Clustering Based on Optimized X-means Algorithm Under Hadoop Platform

在线阅读下载全文

作  者:张鹏飞[1] 江岸[1] 熊念[2] ZHANG Pengfei;JIANG An;XIONG Nian(School of Computer,Guangdong Agriculture Industry Business Polytechnic,Guangzhou 510507,China;School of Information Science and Technology,Jinan University,Guangzhou 510632,China)

机构地区:[1]广东农工商职业技术学院计算机学院,广州510507 [2]暨南大学信息科学技术学院,广州510632

出  处:《计算机测量与控制》2023年第12期284-289,309,共7页Computer Measurement &Control

基  金:广东省普通高校重点领域专项(新一代信息技术)课题(2023ZDZX1068,2021ZDZX1138)。

摘  要:针对现有聚类方法对数据处理规模的局限性,解决数据聚类效果差的问题,在Hadoop平台的支持下提出基于优化X-means算法的大数据聚类方法;利用Hadoop平台架构与函数采集大数据样本,通过缺失补偿、噪声滤波、归一化等步骤,实现初始样本数据的预处理;选择大数据聚类中心,分别提取聚类中心数据与其他所有数据样本的特征,计算数据样本与聚类中心之间的特征相似度;以相似度度量结果为聚类判定条件,利用优化X-means算法确定数据所属类型,最终实现大数据的聚类处理工作;通过聚类效果测试实验得出结论:在有、无两种实验条件下,与传统聚类方法相比,优化设计方法的查全率和查准率分别提升了4.75%和4.5%,同时优化聚类方法得出数据具有更高利用率。In response to the limitations of existing clustering methods on data processing scale and poor performance of solving data clustering,a big data clustering method based on optimized X-means algorithm is proposed with the support of Hadoop platform.The Hadoop platform architecture and functions are used to collect the big data samples,and implement the preprocessing of the ini-tial sample data is through the steps such as missing compensation,noise filtering,and normalization.The big data clustering center is selected to extract the features of the clustering center data and all other data samples respectively,and calculate the feature simi-larity between the data samples and the clustering center.Using similarity measurement results as the clustering criteria,the opti-mized X-means algorithm is used to determine the type of data,ultimately achieving the processing of big data clustering.Through the testing experiments of clustering effectiveness,it is concluded that compared to traditional clustering methods with or without two ex-perimental conditions,the recall and precision of the optimized design method are improved by 4.75%and 4.5%respectively.at the same time,the optimized clustering method has higher data utilization rate.

关 键 词:HADOOP平台 优化X-means算法 大数据聚类 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象