检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘卫明[1] 崔瑜 毛伊敏[1] 刘蔚 Liu Weiming;Cui Yu;Mao Yimin;Liu Wei(School of Information Engineering,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China;School of Information Engineering,Gannan University of Science&Technology,Ganzhou Jiangxi 341000,China)
机构地区:[1]江西理工大学信息工程学院,江西赣州341000 [2]赣南科技学院电子信息工程学院,江西赣州341000
出 处:《计算机应用研究》2022年第11期3244-3251,3257,共9页Application Research of Computers
基 金:2020年度科技创新2030—“新一代人工智能”重大项目(2020AAA0109605);国家自然科学基金资助项目(41562019)。
摘 要:针对大数据环境下并行K-means算法存在的面对高维数据聚类效果差、数据分区不均匀、初始质心敏感等问题,提出了一种基于MapReduce和MSSA的并行K-means算法MR-MSKCA。首先,提出基于肯德尔相关系数和深度稀疏自动编码器的降维策略(dimensionality reduction strategy based on Kendall correlation coefficient and DSAE,DRKCAE)对高维数据进行特征加权和特征提取,解决了高维数据不相关特征和结构稀疏导致的聚类效果差的问题;其次,提出基于两段映射的广义超平面分区策略(uniform partition strategy based on two-stage mapping,UPS)对数据集进行划分,获取均匀的数据分区;最后提出非均匀变异麻雀搜索算法(non-uniform mutation sparrow search algorithm,MSSA)用于获取并行K-means的聚类质心,解决了算法初始质心敏感的问题。在UCI数据集上进行的实验显示,MR-MSKCA较MR-KNMF、MR-PGDLSH、MR-GAPKCA的运行时间分别降低了45.1%、49.1%、59.8%,聚类效果分别提升了19.2%、22.8%、24%,表明MR-MSKCA对大数据进行聚类时有良好性能,适用于不同场景的大数据聚类分析。In the big data environment,the parallel K-means clustering algorithm suffers from poor clustering effect,unba-lanced data partition,cluster centroid sensitivity.To solve these problems,this paper proposed a parallel K-means algorithm based on MapReduce and MSSA(MR-MSKCA).Firstly,MR-MSKCA designed a dimensionality reduction strategy(DRKCAE),which used Kendall correlation coefficient and deep sparse autoencoder to weight features and extract features to improve the clustering effect of high-dimensional data.Secondly,it proposed a UPS,which divided the dataset and obtained uniform data partition.Finally,this paper proposed MSSA to get the parallel K-means clustering centroid,which solved the problem of initial centroid sensitivity.Compared with MR-KNMF,MR-PGDLSH and MR-GAPKCA,the running time of MR-MSKCA decreased by 45.1%,49.1%,59.8%,and the clustering effect increased by 19.2%,22.8%,24%.Experiments show that the MR-MSKCA not only has excellent performance,but also has strong adaptability with large-scale dataset.
关 键 词:MAPREDUCE框架 DRKCAE策略 UPS策略 并行聚类 MSSA算法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.157.139