基于Fisher判别的分布式K-Means聚类算法  被引量:5

Distributed K-Means clustering algorithm based on Fisher discriminant ratio

在线阅读下载全文

作  者:彭长生[1] 

机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013

出  处:《江苏大学学报(自然科学版)》2014年第4期422-427,共6页Journal of Jiangsu University:Natural Science Edition

基  金:国家科技创新基金资助项目(10C26213200946);江苏省科技创新项目(BC2009265);镇江市工业支撑项目(GY2012007)

摘  要:为了解决集中式聚类算法不能处理海量大数据的问题,提出基于Fisher判别确定置信半径的分布式聚类算法.应用网络上各个节点的计算、存储能力,以及网络的带宽,将聚类所需的时间复杂度和空间复杂度平摊到各个节点.通过应用Fisher线性判别找到节点在同一子类数据上的稠密和稀疏分布,从而快速确定聚类的置信半径并指导下一步的聚类过程,使得保持聚类精度的同时能提高分布式聚类的速度.对算法进行了数值模拟,并使用真实数据完成了试验.结果表明,所提出算法相比DFEKM聚类算法,能根据数据分布的不同在聚类结果和聚类速度上达到很好的平衡,这表明该算法具有更好的健壮性.To solve the problem that centralized clustering algorithms could not deal with big data sets, a distributed K-Means clustering algorithm was proposed based on the confidence radius by Fisher discriminant ratio in local nodes. The computing and storage capacitates as well as bandwidth of each nodes were used to share the time and space expenses to each nodes in the P2P networks. The Fisher discriminant ratio was applied to find the difference of dense and sparse distributions in the same cluster in local nodes. The ratio was used to deduce the confidence radius for the next clustering processing to maintain clustering accuracy, and the distributed clustering was speeded up at the same time. The numerical simulation of algorithm and experiments were completed based on real data. The results show that a good balance between accuracy and speed is obtained according to the data distributions. The proposed algorithm has better robustness than the DFEKM algorithm.

关 键 词:P2P网络 聚类算法 分布式聚类 FISHER线性判别 置信半径 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象