检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张鹏飞[1] 江岸[1] 熊念[2] ZHANG Pengfei;JIANG An;XIONG Nian(School of Computer,Guangdong Agriculture Industry Business Polytechnic,Guangzhou 510507,China;School of Information Science and Technology,Jinan University,Guangzhou 510632,China)
机构地区:[1]广东农工商职业技术学院计算机学院,广州510507 [2]暨南大学信息科学技术学院,广州510632
出 处:《计算机测量与控制》2023年第12期284-289,309,共7页Computer Measurement &Control
基 金:广东省普通高校重点领域专项(新一代信息技术)课题(2023ZDZX1068,2021ZDZX1138)。
摘 要:针对现有聚类方法对数据处理规模的局限性,解决数据聚类效果差的问题,在Hadoop平台的支持下提出基于优化X-means算法的大数据聚类方法;利用Hadoop平台架构与函数采集大数据样本,通过缺失补偿、噪声滤波、归一化等步骤,实现初始样本数据的预处理;选择大数据聚类中心,分别提取聚类中心数据与其他所有数据样本的特征,计算数据样本与聚类中心之间的特征相似度;以相似度度量结果为聚类判定条件,利用优化X-means算法确定数据所属类型,最终实现大数据的聚类处理工作;通过聚类效果测试实验得出结论:在有、无两种实验条件下,与传统聚类方法相比,优化设计方法的查全率和查准率分别提升了4.75%和4.5%,同时优化聚类方法得出数据具有更高利用率。In response to the limitations of existing clustering methods on data processing scale and poor performance of solving data clustering,a big data clustering method based on optimized X-means algorithm is proposed with the support of Hadoop platform.The Hadoop platform architecture and functions are used to collect the big data samples,and implement the preprocessing of the ini-tial sample data is through the steps such as missing compensation,noise filtering,and normalization.The big data clustering center is selected to extract the features of the clustering center data and all other data samples respectively,and calculate the feature simi-larity between the data samples and the clustering center.Using similarity measurement results as the clustering criteria,the opti-mized X-means algorithm is used to determine the type of data,ultimately achieving the processing of big data clustering.Through the testing experiments of clustering effectiveness,it is concluded that compared to traditional clustering methods with or without two ex-perimental conditions,the recall and precision of the optimized design method are improved by 4.75%and 4.5%respectively.at the same time,the optimized clustering method has higher data utilization rate.
关 键 词:HADOOP平台 优化X-means算法 大数据聚类
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15