一种基于分层抽样的大数据快速聚类算法被引量：6

A LARGE DATA FAST CLUSTERING ALGORITHM BASED ON STRATIFIED SAMPLING

作　　者：李顺勇[1] 张钰嘉彭晓庆曹付元[2,3] 刘恩乾 Li Shunyong;Zhang Yujia;Peng Xiaoqing;Cao Fuyuan;Liu Enqian(School of Mathematical Sciences,Shanxi University,Taiyuan 030006,Shanxi,China;School of Computer and Information Technology,Shanxi University,Taiyuan 030006,Shanxi,China;Key Laboratory of Computational Intelligence and Chinese Information Processing,Ministry of Education,Taiyuan 030006,Shanxi,China)

机构地区：[1]山西大学数学科学学院,山西太原030006 [2]山西大学计算机与信息技术学院,山西太原030006 [3]计算智能与中文信息处理教育部重点实验室,山西太原030006

出　　处：《计算机应用与软件》2020年第10期256-261,277,共7页Computer Applications and Software

基　　金：国家自然科学基金项目(61573229);山西省基础研究计划项目(201701D121004);山西省回国留学人员科研资助项目(2017-020);太原市科技计划研发项目(2018140105000084)。

摘　　要：针对K-means算法处理大规模数据时算法迭代时间较长的问题,提出一种基于分层抽样的大数据快速聚类算法(A Large Data Fast Clustering Algorithm Based on Stratified Sampling,FCASS)。提出一种分层方法,可以快速将原始数据集进行分层,使得层内数据相似度较大,层间数据相似度较小;引入抽样时间函数,并求得各层样本量的最优分配方案;用K-means算法对样本集进行聚类,得到最终结果。在4个UCI数据集以及8个人工数据集上进行实验,结果表明,FCASS算法具有较高的聚类精度,并且在大规模数据集上运行速度较快。A large data fast clustering algorithm based on stratified sampling(FCASS)is proposed for K-means algorithm dealing with large-scale data with long runtime.A hierarchical method was proposed to stratified the original data set quickly,so that the similarity of the data in the layer was large,and the similarity between the layers was small.Then,the sampling time function was introduced,and the optimal allocation scheme of the sample size of each layer was obtained.Finally,the sample set was clustered by K-means algorithm to get the final result.Experiments on 4 UCI datasets and 8 artificial datasets show that the FCASS has high clustering accuracy and runs fast on large-scale datasets.

关键词：K-MEANS 分层抽样抽样时间聚类精度运行速度

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于分层抽样的大数据快速聚类算法被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于分层抽样的大数据快速聚类算法 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于分层抽样的大数据快速聚类算法被引量：6