一种基于势能模型的数据流聚类算法被引量：3

A DATA STREAM CLUSTERING ALGORITHM BASED ON POTENTIAL FIELD MODEL

作　　者：舒越解庆刘永坚[1] 唐伶俐[1] Shu Yue;Xie Qing;Liu Yongjian;Tang Lingli(School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,Hubei,China)

机构地区：[1]武汉理工大学计算机科学与技术学院,湖北武汉430070

出　　处：《计算机应用与软件》2022年第11期222-230,237,共10页Computer Applications and Software

基　　金：国家自然科学基金项目(61602353)。

摘　　要：传统的数据流聚类算法大部分将距离作为相似度度量标准,这造成对噪点敏感的问题,聚类效果不理想。针对这种情况,提出一种基于势能模型的数据流聚类算法PHAStream,该算法结合在线/离线两阶段数据流聚类框架和基于势能模型的层次聚类算法PHA,可以有效处理噪点问题。初始化阶段使用PHA聚类算法得到初始微簇;在线阶段,对每个新到达的数据点,采用融合势能和距离的相似度度量标准来更新微簇,每隔一段时间采取剪枝策略删除过期的微簇,并调整所有微簇的类型;离线阶段,对所有正常微簇使用改进的PHA聚类算法得到最终聚类结果。在两组真实数据集上的对比实验表明,PHAStream算法可以有效提高聚类质量、聚类纯度和时间效率。Most traditional data stream clustering algorithms use distance as a similarity metric,which causes the sensitivity to noise and undesirable clustering effort.In view of this,a data stream clustering algorithm based on potential field model PHAStream is proposed.It combined an online/offline two-stage data stream clustering framework and the potential-based hierarchical clustering algorithm PHA.This algorithm could effectively deal with noise.The PHA clustering algorithm was used to obtain the initial micro-clusters in the initial phase.In the online phase,for each newly arrived data point,a similarity measure based on distance and potential energy was proposed to update the micro-clusters.In the offline phase,the improved PHA algorithm for all the normal micro-clusters was used to get the final result.Comparison experiments on two real data sets show that the PHAStream algorithm can effectively improve clustering quality,clustering purity and time efficiency.

关键词：数据挖掘数据流聚类势能场模型数据概要

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于势能模型的数据流聚类算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于势能模型的数据流聚类算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于势能模型的数据流聚类算法被引量：3