检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]国家数字交换系统工程技术研究中心,河南郑州450002
出 处:《计算机工程与设计》2011年第8期2708-2711,2763,共5页Computer Engineering and Design
基 金:国家863高技术研究发展计划基金项目(2008AA011001)
摘 要:针对分布式数据流中数据有交叠、不完整的情况和聚类需要较低通信代价的要求,提出了密度和模型聚类思想相结合的分布式数据流聚类算法DAM-Distream。该算法利用混合高斯模型描述数据流的分布概况,可以有效压缩数据量并能较好的反映分布数据流间的交叠性。由于获得模型参数的EM算法对初值敏感,应用Hoeffding界理论和基于密度的算法对数据流进行初聚类,得到比较准确的初始参数,最后采用合并近似模型策略获得全局模型。仿真实验结果表明,DAM-Distream能有效克服EM算法的缺点,获得的模型参数性能更优,在降低系统的通信代价的同时能提高分布式环境下数据流的聚类质量。According to the condition that there are some overlap and missing data in distributed data streams, and to meet the needs of lower communication costs, DAM-Distream, a clustering algorithm combining density method and model method is proposed. The algorithm uses the Ganssian mixture model to describe the data streams flowing into the local distribution sites. However, Gaussian mixture model parameters are obtained by EM algorithm which is sensitive to initial value. DAM-Distream presents density based algorithm to cluster data streams at first, that is, to search the suitable initial parameters for Gaussian mixture model. Second, EM algorithm is used to iterative clustering, and then the algorithm determines. At last, the models are uploaded to the central site for the integrated treatment. Experimental results show that DAM-Distream can effectively overcome the shortcomings of the EM algorithm and obtain better parameters of GMM. Experiment show that it can improve the clustering quality of data streams in distributed systems and reduce the eommunl- cation cost of the system.
关 键 词:分布式数据流 聚类 基于密度 基于模型 数据挖掘
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229