基于菌群优化的K均值聚类算法研究被引量：13

K-means clustering algorithm based on bacteria foraging optimization

作　　者：郭婧[1] 耿海军[2] 吴勇 Guo Jing;Geng Haijun;Wu Yong(Department of Electronic Information Engineering,Jinzhong Vocational and Technical Collge,Jinzhong 030600,China;School of Automation and Software Engineering,Shanxi University,Taiyuan 030013,China)

机构地区：[1]晋中职业技术学院电子信息系,山西晋中030600 [2]山西大学自动化与软件学院,山西太原030013

出　　处：《南京理工大学学报》2021年第3期314-319,共6页Journal of Nanjing University of Science and Technology

基　　金：国家自然科学基金(61702315)。

摘　　要：为了提高数据挖掘的聚类准确度,提出了一种基于菌群优化的K均值(K-means)聚类算法。采用K均值算法建立数据聚类模型。根据聚类类别数设定多个聚类中心坐标。设定所属类别距离阈值,然后计算待聚类点和所有中心点距离来划分该聚类点的类别。根据参与聚类各节点和各自中心点的距离值建立适应度函数。引入菌群优化算法对K均值聚类过程进行优化。通过细菌的多次驱散、复制和趋化操作,不断提高数据聚类的适应度,直到达到最大操作次数或者最低聚类精确度阈值,获得稳定的数据聚类挖掘算法。实验证明,通过合理设置驱散和趋化次数,微调菌群算法的引力和斥力参数,能够获得较好的聚类性能。分别采用K均值和该文基于菌群优化的K均值聚类算法对6个不同数据集进行聚类仿真。该文算法对所有数据集的平均聚类准确率都高于92%。针对UCI混合数据集,当聚类达到稳定时,该文算法的聚类标准差明显优于K均值聚类算法;而且该文算法对5000个混合样本完成聚类消耗的时间约70 s,K均值聚类算法约需93 s。In order to improve the clustering accuracy of data mining,a K-means clustering algorithm based on bacteria foraging optimization is proposed.The K-means algorithm is used to establish a data clustering model.According to the number of clustering categories,multiple clustering center coordinates are set.Set the distance threshold value of each clustering category,and then calculate the distance between a cluster point and all the center points to divide the cluster point into a category.A fitness function is established according to the distance between each cluster point and its center point.Bacteria foraging optimization algorithm is introduced to optimize the K-means clustering process.Through multiple bacterial dispersal,replication and chemotaxis operations,the fitness of data clustering is improved continuously until the maximum number of operations or the minimum clustering accuracy threshold is reached,and a stable data clustering mining algorithm is achieved.The experimental results show that better clustering performance can be achieved by reasonably setting the number of dispersal and chemotaxis,and fine tuning the parameters of gravity and repulsion.K-means clustering algorithm and K-means clustering algorithm based on bacteria foraging optimization are used to cluster six different data sets.The average clustering accuracy of the proposed algorithm for all data sets is higher than 92%.For UCI mixed data sets,when the clustering is stable,the clustering standard deviation of the proposed algorithm is significantly better than that of K-means clustering algorithm;moreover,it takes about 70 s for the algorithm to complete the clustering for 5000 mixed samples,and 93 s for the K-means clustering algorithm.

关键词：数据挖掘菌群优化 K均值聚类适应度函数层次聚类粒子群优化

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于菌群优化的K均值聚类算法研究被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于菌群优化的K均值聚类算法研究 被引量：13

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于菌群优化的K均值聚类算法研究被引量：13