基于优化模糊C-means算法的不平衡大数据分类研究

Research on unbalanced big data classification based on optimized fuzzy C-means algorithm

作　　者：卓柳俊曾心怡 ZHUO Liu-jun;ZENG Xin-yi(School of Information Renmin University of China,Beijing 100872,China;Henan Academy of Social Sciences,Zhengzhou 450002,China)

机构地区：[1]中国人民大学信息学院,北京100872 [2]河南省社会科学院,郑州450002

出　　处：《信息技术》2024年第10期14-21,29,共9页Information Technology

基　　金：河南省社会科学规划项目(2022CFX029)。

摘　　要：针对不平衡大数据的分类问题,提出一种优化模糊C-means算法的不平衡大数据分类算法。先计算C-means模糊交叉算子,定义优化函数,并求解大数据不平衡增益。利用Spark分类平台,确定大数据样本压缩模糊近邻值的取值范围,再通过放大近邻值的处理方式,定义不平衡阈向量,从而完善整个分类流程,完成基于优化模糊C-means算法的不平衡大数据分类方法的设计。实验结果表明,上述分类方法的应用,可将正例信息、负例信息的取样长度区间完全分离开来,能有效解决因不平衡大数据分类不精准造成的信息样本混淆的问题,符合实际应用需求。To solve the classification problem of unbalanced big data,this paper proposes an unbalanced big data classification algorithm based on optimized fuzzy C-means algorithm.Firstly,the C-means fuzzy crossover operator is calculated,the optimization function is defined,and the unbalanced gain of big data is solved.The Spark classification platform is used to determine the value range of condensed fuzzy nearest neighbor values of big data samples,and then the unbalanced threshold vector is defined by the processing method of enlarging the nearest neighbor values,so as to improve the whole classification process and complete the design of unbalanced big data classification method based on the optimized fuzzy C-means algorithm.The experiment results show that the application of the above classification method can completely separate the sampling length interval of positive example information and negative example information,effectively solve the problem of information sample confusion caused by inaccurate classification of unbalanced big data,and meet the practical application requirements.

关键词：优化模糊C-means算法不平衡大数据交叉算子卡方检验压缩模糊近邻值

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于优化模糊C-means算法的不平衡大数据分类研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于优化模糊C-means算法的不平衡大数据分类研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索