检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭婧[1] 耿海军[2] 吴勇 Guo Jing;Geng Haijun;Wu Yong(Department of Electronic Information Engineering,Jinzhong Vocational and Technical Collge,Jinzhong 030600,China;School of Automation and Software Engineering,Shanxi University,Taiyuan 030013,China)
机构地区:[1]晋中职业技术学院电子信息系,山西晋中030600 [2]山西大学自动化与软件学院,山西太原030013
出 处:《南京理工大学学报》2021年第3期314-319,共6页Journal of Nanjing University of Science and Technology
基 金:国家自然科学基金(61702315)。
摘 要:为了提高数据挖掘的聚类准确度,提出了一种基于菌群优化的K均值(K-means)聚类算法。采用K均值算法建立数据聚类模型。根据聚类类别数设定多个聚类中心坐标。设定所属类别距离阈值,然后计算待聚类点和所有中心点距离来划分该聚类点的类别。根据参与聚类各节点和各自中心点的距离值建立适应度函数。引入菌群优化算法对K均值聚类过程进行优化。通过细菌的多次驱散、复制和趋化操作,不断提高数据聚类的适应度,直到达到最大操作次数或者最低聚类精确度阈值,获得稳定的数据聚类挖掘算法。实验证明,通过合理设置驱散和趋化次数,微调菌群算法的引力和斥力参数,能够获得较好的聚类性能。分别采用K均值和该文基于菌群优化的K均值聚类算法对6个不同数据集进行聚类仿真。该文算法对所有数据集的平均聚类准确率都高于92%。针对UCI混合数据集,当聚类达到稳定时,该文算法的聚类标准差明显优于K均值聚类算法;而且该文算法对5000个混合样本完成聚类消耗的时间约70 s,K均值聚类算法约需93 s。In order to improve the clustering accuracy of data mining,a K-means clustering algorithm based on bacteria foraging optimization is proposed.The K-means algorithm is used to establish a data clustering model.According to the number of clustering categories,multiple clustering center coordinates are set.Set the distance threshold value of each clustering category,and then calculate the distance between a cluster point and all the center points to divide the cluster point into a category.A fitness function is established according to the distance between each cluster point and its center point.Bacteria foraging optimization algorithm is introduced to optimize the K-means clustering process.Through multiple bacterial dispersal,replication and chemotaxis operations,the fitness of data clustering is improved continuously until the maximum number of operations or the minimum clustering accuracy threshold is reached,and a stable data clustering mining algorithm is achieved.The experimental results show that better clustering performance can be achieved by reasonably setting the number of dispersal and chemotaxis,and fine tuning the parameters of gravity and repulsion.K-means clustering algorithm and K-means clustering algorithm based on bacteria foraging optimization are used to cluster six different data sets.The average clustering accuracy of the proposed algorithm for all data sets is higher than 92%.For UCI mixed data sets,when the clustering is stable,the clustering standard deviation of the proposed algorithm is significantly better than that of K-means clustering algorithm;moreover,it takes about 70 s for the algorithm to complete the clustering for 5000 mixed samples,and 93 s for the K-means clustering algorithm.
关 键 词:数据挖掘 菌群优化 K均值 聚类 适应度函数 层次聚类 粒子群优化
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31