大数据下的分布式精确模糊KNN分类算法被引量：4

Accurate distributed fuzzy KNN classification algorithm for big data

作　　者：邹劲松[1] 李芳[2] Zou Jinsong;Li Fang(School of Putian Big Data Industry,Chongqing Water Resources&Electric Engineering College,Chongqing 402160,China;College of Computer Science,Chongqing University,Chongqing 400044,China)

机构地区：[1]重庆水利电力职业技术学院普天大数据产业学院,重庆402160 [2]重庆大学计算机学院,重庆400044

出　　处：《计算机应用研究》2019年第12期3701-3704,共4页Application Research of Computers

基　　金：重庆市教育科学“十三五”规划2017年度重点无经费课题(2017-GX-181)

摘　　要：针对K近邻(KNN)方法处理大数据集的效率问题进行了研究,提出了一种基于Spark框架的分布式精确模糊KNN分类算法,创新性地将Spark框架分布式map和reduce过程与模糊KNN结合。首先对不同分区中训练样本类别信息进行模糊化处理,得到类别隶属度,将训练集转换为添加类隶属度的模糊训练集;然后使用KNN算法对先前计算的类成员测试集计算得到k个最近邻;最后通过距离权重进行分类。针对百万级大数据集样本的实验,以及与其他算法的对比实验表明,所提算法是可行的和有效的。For research on the efficiency of processing large data sets with K-nearest neighbor( KNN) method,this paper proposed a distributed exact fuzzy-KNN classification algorithm based on Spark framework. The method innovatively combined the Spark framework distributed map and reduce processes with the fuzzy-KNN. Firstly,it processed the training sample category information in different partitions to obtain the class membership degree. It converted the training set into a fuzzy training set with adding membership degrees,and then used the KNN algorithm to calculate the k nearest neighbor of the previously calculated class member test set. Finally,it was classified by distance weight. Experiments on mega-scale dataset samples and comparison experiments with other algorithms show that this method is feasible and effective.

关键词：大数据分布式Spark框架类隶属度模糊KNN算法

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大数据下的分布式精确模糊KNN分类算法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

大数据下的分布式精确模糊KNN分类算法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

大数据下的分布式精确模糊KNN分类算法被引量：4