基于文化基因算法和犹豫模糊集的聚类算法及其分布并行实现  被引量:2

CLUSTERING ALGORITHM BASED ON MEMETIC ALGORITHM AND HESITANT FUZZY SETS AND ITS DISTRIBUTED PARALLEL IMPLEMENTATION

在线阅读下载全文

作  者:王超英[1] Wang Chaoying(Dongguan Polytechnic,Dongguan 523808,Guangdong,China)

机构地区:[1]东莞职业技术学院,广东东莞523808

出  处:《计算机应用与软件》2021年第4期295-304,共10页Computer Applications and Software

摘  要:为了提高海量高维小样本数据的聚类准确率和效率,提出一种基于递归文化基因和云计算分布式计算的高维大数据聚类系统。基于Spark分布式计算平台设计迭代的聚类系统,分为基于递归文化基因的特征归简处理和基于密度的聚类处理。前者将基因微阵列的聚类准确率结果作为主目标,特征数量作为次目标,递归地化简特征空间;后者基于犹豫模糊集理论设计基于密度的聚类算法,采用加权的犹豫模糊集相关系数度量数据之间的距离。基于人工合成数据集和临床实验数据集均进行仿真实验,结果表明该算法在聚类准确率、扩展性和时间效率上均实现了较好的效果。In order to improve the clustering accuracy and efficiency of massive high dimensional small sample size datasets,this paper proposes a high dimensional big data clustering system based on recursive memetic algorithm and cloud distributed computing.We designed a iterative clustering system based on Spark distributed computing platform,and the system consisted of recursive memetic-based feature reduction and density-based clustering.The former treated the clustering accuracy results of gene microarrays as major objective,and treated feature number as secondary objective,it reduced the feature space recursively;the latter designed the density based clustering algorithm based on the hesitant fuzzy set theory,adopted weighted hesitant fuzzy set correlation coefficient to measure the distances between data points.Simulation experiments were done based on both synthetic datasets and clinical datasets,experimental results indicate that the proposed algorithm realizes good results in clustering accuracy,scalability and time efficiency.

关 键 词:大数据分析 高维小样本数据 文化基因算法 分布式计算 犹豫模糊集 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象