检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:彭长生[1]
机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013
出 处:《江苏大学学报(自然科学版)》2014年第4期422-427,共6页Journal of Jiangsu University:Natural Science Edition
基 金:国家科技创新基金资助项目(10C26213200946);江苏省科技创新项目(BC2009265);镇江市工业支撑项目(GY2012007)
摘 要:为了解决集中式聚类算法不能处理海量大数据的问题,提出基于Fisher判别确定置信半径的分布式聚类算法.应用网络上各个节点的计算、存储能力,以及网络的带宽,将聚类所需的时间复杂度和空间复杂度平摊到各个节点.通过应用Fisher线性判别找到节点在同一子类数据上的稠密和稀疏分布,从而快速确定聚类的置信半径并指导下一步的聚类过程,使得保持聚类精度的同时能提高分布式聚类的速度.对算法进行了数值模拟,并使用真实数据完成了试验.结果表明,所提出算法相比DFEKM聚类算法,能根据数据分布的不同在聚类结果和聚类速度上达到很好的平衡,这表明该算法具有更好的健壮性.To solve the problem that centralized clustering algorithms could not deal with big data sets, a distributed K-Means clustering algorithm was proposed based on the confidence radius by Fisher discriminant ratio in local nodes. The computing and storage capacitates as well as bandwidth of each nodes were used to share the time and space expenses to each nodes in the P2P networks. The Fisher discriminant ratio was applied to find the difference of dense and sparse distributions in the same cluster in local nodes. The ratio was used to deduce the confidence radius for the next clustering processing to maintain clustering accuracy, and the distributed clustering was speeded up at the same time. The numerical simulation of algorithm and experiments were completed based on real data. The results show that a good balance between accuracy and speed is obtained according to the data distributions. The proposed algorithm has better robustness than the DFEKM algorithm.
关 键 词:P2P网络 聚类算法 分布式聚类 FISHER线性判别 置信半径
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249