检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:古险峰 汤永利[2] GU Xian-feng;TANG yong-li(Zhengzhou University of Industrial Technology,School of Information Engineering,Zhengzhou Henan 451100,China;College of Computer Science and Technology,Henan University of Technology,Jiaozuo Henan 454000 China)
机构地区:[1]郑州工业应用技术学院信息工程学院,河南郑州451100 [2]河南理工大学计算机科学与技术学院,河南焦作454000
出 处:《计算机仿真》2023年第9期458-461,481,共5页Computer Simulation
摘 要:数据挖掘技术可以从大量无规则的数据集中获取有效的信息,由于大多数的数据为混合性数据,为了提高处理混合属性数据算法的性能和聚类质量,提出基于群体智能算法的混合属性大数据聚类方法。首先将数据集分为数值和分类两个属性子集,采用对应的聚类方法对两个子集进行多次聚类,并利用共识函数对聚类的结果进行融合,构建出混合属性数据分段融合框架。然后为了避免类中心数据一致导致的空簇问题,利用信息熵对数值属性数据加权处理,再采用平均差异度方法选择每个数据对象的初始聚类中心。最后对待分类数据样本的聚类中心编码,为了衡量聚类问题的有效性,采用适应度函数对个体的好坏进行评价,利用改进粒子群智能优化算法的全局搜索能力找到数据集中的最优解、每次迭代后粒子更新后的最优位置。实验结果表明,上述方法聚类质量和聚类精度较高,不仅可以提高粒子的搜索效率,还能增强算法的鲁棒性。Data mining technology can obtain effective information from a large number of irregular data sets.However,because most of the data are mixed data,in order to improve the performance and clustering quality of the algorithm for processing mixed attribute data,a hybrid attribute big data clustering simulation method based on swarm intelligence algorithm is proposed.Firstly,the data set was divided into two attribute subsets:numerical subsets and classification subsets.The corresponding clustering method was used to cluster the two subsets many times,and the clustering results were fused by consensus function to construct a segmented fusion framework of mixed attribute data.Then,in order to avoid the problem of empty clusters caused by the consistency of class center data,the numerical attribute data were weighted by information entropy,and then the average difference method was used to select the initial cluster center of each data object.Finally,for the cluster center coding of classified data samples,in order to measure the effectiveness of clustering problems,the fitness function was used to evaluate the quality of individuals,and the global search ability of the improved particle swarm optimization algorithm was used to find the optimal solution in the data set and the optimal position after particle update after each iteration.Experimental results show that this method has high clustering quality and clustering accuracy.It can not only improve the particle search efficiency,but also enhance the robustness of the algorithm.
关 键 词:混合属性数据 共识函数 信息熵 平均差异度 改进粒子群
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.21.218