分布式数据流聚类算法被引量：2

Clustering algorithm over distributed data stream

出　　处：《计算机工程与设计》2011年第8期2708-2711,2763,共5页Computer Engineering and Design

基　　金：国家863高技术研究发展计划基金项目(2008AA011001)

摘　　要：针对分布式数据流中数据有交叠、不完整的情况和聚类需要较低通信代价的要求,提出了密度和模型聚类思想相结合的分布式数据流聚类算法DAM-Distream。该算法利用混合高斯模型描述数据流的分布概况,可以有效压缩数据量并能较好的反映分布数据流间的交叠性。由于获得模型参数的EM算法对初值敏感,应用Hoeffding界理论和基于密度的算法对数据流进行初聚类,得到比较准确的初始参数,最后采用合并近似模型策略获得全局模型。仿真实验结果表明,DAM-Distream能有效克服EM算法的缺点,获得的模型参数性能更优,在降低系统的通信代价的同时能提高分布式环境下数据流的聚类质量。According to the condition that there are some overlap and missing data in distributed data streams, and to meet the needs of lower communication costs, DAM-Distream, a clustering algorithm combining density method and model method is proposed. The algorithm uses the Ganssian mixture model to describe the data streams flowing into the local distribution sites. However, Gaussian mixture model parameters are obtained by EM algorithm which is sensitive to initial value. DAM-Distream presents density based algorithm to cluster data streams at first, that is, to search the suitable initial parameters for Gaussian mixture model. Second, EM algorithm is used to iterative clustering, and then the algorithm determines. At last, the models are uploaded to the central site for the integrated treatment. Experimental results show that DAM-Distream can effectively overcome the shortcomings of the EM algorithm and obtain better parameters of GMM. Experiment show that it can improve the clustering quality of data streams in distributed systems and reduce the eommunl- cation cost of the system.

关键词：分布式数据流聚类基于密度基于模型数据挖掘

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

分布式数据流聚类算法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

分布式数据流聚类算法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

分布式数据流聚类算法被引量：2