大数据下的分布式精确模糊KNN分类算法  被引量:4

Accurate distributed fuzzy KNN classification algorithm for big data

在线阅读下载全文

作  者:邹劲松[1] 李芳[2] Zou Jinsong;Li Fang(School of Putian Big Data Industry,Chongqing Water Resources&Electric Engineering College,Chongqing 402160,China;College of Computer Science,Chongqing University,Chongqing 400044,China)

机构地区:[1]重庆水利电力职业技术学院普天大数据产业学院,重庆402160 [2]重庆大学计算机学院,重庆400044

出  处:《计算机应用研究》2019年第12期3701-3704,共4页Application Research of Computers

基  金:重庆市教育科学“十三五”规划2017年度重点无经费课题(2017-GX-181)

摘  要:针对K近邻(KNN)方法处理大数据集的效率问题进行了研究,提出了一种基于Spark框架的分布式精确模糊KNN分类算法,创新性地将Spark框架分布式map和reduce过程与模糊KNN结合。首先对不同分区中训练样本类别信息进行模糊化处理,得到类别隶属度,将训练集转换为添加类隶属度的模糊训练集;然后使用KNN算法对先前计算的类成员测试集计算得到k个最近邻;最后通过距离权重进行分类。针对百万级大数据集样本的实验,以及与其他算法的对比实验表明,所提算法是可行的和有效的。For research on the efficiency of processing large data sets with K-nearest neighbor( KNN) method,this paper proposed a distributed exact fuzzy-KNN classification algorithm based on Spark framework. The method innovatively combined the Spark framework distributed map and reduce processes with the fuzzy-KNN. Firstly,it processed the training sample category information in different partitions to obtain the class membership degree. It converted the training set into a fuzzy training set with adding membership degrees,and then used the KNN algorithm to calculate the k nearest neighbor of the previously calculated class member test set. Finally,it was classified by distance weight. Experiments on mega-scale dataset samples and comparison experiments with other algorithms show that this method is feasible and effective.

关 键 词:大数据 分布式Spark框架 类隶属度 模糊KNN算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象