检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:卓柳俊 曾心怡 ZHUO Liu-jun;ZENG Xin-yi(School of Information Renmin University of China,Beijing 100872,China;Henan Academy of Social Sciences,Zhengzhou 450002,China)
机构地区:[1]中国人民大学信息学院,北京100872 [2]河南省社会科学院,郑州450002
出 处:《信息技术》2024年第10期14-21,29,共9页Information Technology
基 金:河南省社会科学规划项目(2022CFX029)。
摘 要:针对不平衡大数据的分类问题,提出一种优化模糊C-means算法的不平衡大数据分类算法。先计算C-means模糊交叉算子,定义优化函数,并求解大数据不平衡增益。利用Spark分类平台,确定大数据样本压缩模糊近邻值的取值范围,再通过放大近邻值的处理方式,定义不平衡阈向量,从而完善整个分类流程,完成基于优化模糊C-means算法的不平衡大数据分类方法的设计。实验结果表明,上述分类方法的应用,可将正例信息、负例信息的取样长度区间完全分离开来,能有效解决因不平衡大数据分类不精准造成的信息样本混淆的问题,符合实际应用需求。To solve the classification problem of unbalanced big data,this paper proposes an unbalanced big data classification algorithm based on optimized fuzzy C-means algorithm.Firstly,the C-means fuzzy crossover operator is calculated,the optimization function is defined,and the unbalanced gain of big data is solved.The Spark classification platform is used to determine the value range of condensed fuzzy nearest neighbor values of big data samples,and then the unbalanced threshold vector is defined by the processing method of enlarging the nearest neighbor values,so as to improve the whole classification process and complete the design of unbalanced big data classification method based on the optimized fuzzy C-means algorithm.The experiment results show that the application of the above classification method can completely separate the sampling length interval of positive example information and negative example information,effectively solve the problem of information sample confusion caused by inaccurate classification of unbalanced big data,and meet the practical application requirements.
关 键 词:优化模糊C-means算法 不平衡大数据 交叉算子 卡方检验 压缩模糊近邻值
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49