一种联系数表达的位置不确定数据流聚类算法  被引量:7

Clustering Algorithm for Position Uncertain Data Expressed by Connection Number

在线阅读下载全文

作  者:史玲娟 黄德才[1] SHI Ling-juan;HUANG De-cai(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区:[1]浙江工业大学计算机科学与技术学院,杭州310023

出  处:《小型微型计算机系统》2020年第2期361-368,共8页Journal of Chinese Computer Systems

基  金:浙江省基础公益研究计划项目(GG19E090005)资助.

摘  要:在不确定数据流聚类算法的研究中,位置不确定性是一种新的不确定数据类型.已有的不确定数据模型不能很好地描述和处理位置不确定数据.鉴于此,在提出基于联系数的位置不确定数据模型、联系距离函数、微簇密度可达性等主要概念的基础上,提出了一种联系数表达的位置不确定数据流聚类算法--UCNStream.数据流聚类算法采用在线/离线两级处理框架,使用基于密度峰值思想的初始化策略,定义了新的可动态维护的微簇聚类特征向量.利用衰减函数和微簇删除机制对微簇进行在线维护,准确地反映了数据流的演化过程.最后,分析了算法的计算复杂性,并通过对实际数据集上的实验与几种优秀的聚类算法进行了比较,实验结果表明,UCNStream算法具有较高的聚类精度和处理效率.Position uncertain data is a new type of uncertain data in the research of clustering algorithm for uncertain data streams.Existing uncertain data processing models can not describe and process position uncertain data well.Therefore,this paper presents the main concepts of connection number based position uncertain model,connection distance function and density accessibility of microclusters.Based on these concepts,a connection number based UCNStream(Uncertain Connection Number Stream)algorithm is proposed for location uncertain data stream.The algorithm adopts an online/offline two-stage processing framework with the initialization strategy based on density peak,and defines a new micro cluster characteristic vector to maintain the arriving data objects dynamically.Beyond that,it accurately reflects the evolution process of data flow by maintaining micro-clusters online with the usage of decay function and micro-cluster deletion mechanism.Finally,the computational complexity of the algorithm is analyzed,and the performances of the proposed algorithm are testified by a series of experiments on real-world data sets in comparison with several outstanding clustering algorithms.The experimental results illustrate that UCNStream algorithm has high clustering accuracy and processing efficiency.

关 键 词:不确定数据流 联系数 聚类 数据挖掘 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象