基于局部离群点检测的高频数据共现聚类算法  被引量:8

High Frequency Data Co-Occurrence Clustering Algorithm Based on Local Outlier Detection

在线阅读下载全文

作  者:周志洪[1,2] 马进 夏正敏[1,2] 陈秀真[1,2] ZHOU Zhi-hong;MA Jin;XIA Zheng-min;CHEN Xiu-zhen(Institute of Network Security Technology,Shanghai Jiao Tong University,Shanghai 200240,China;Key Laboratory of Information Security Integrated Management Technology,Shanghai 200240,China)

机构地区:[1]上海交通大学网络安全技术研究院,上海200240 [2]上海市信息安全综合管理技术研究重点实验室,上海200240

出  处:《计算机仿真》2021年第3期482-486,共5页Computer Simulation

基  金:上海市工业强基专项项目智能网联汽车信息安全研发与公共服务平台(GYQJ-2018-3-03)。

摘  要:高频数据易出现异常且出于无序状态,研究基于局部离群点检测的高频数据共现聚类算法。利用可变网格划分的局部离群点,挖掘高频数据集内的高频数据对象,剔除异常高频数据对象,降序排列各个高频数据对象的局部离群因子值,获取较大离群因子值的高频数据对象,提升高频数据共现聚类的执行效率;计算获取的高频数据对象共现相似度,得到高频数据共现相似度矩阵,根据相似度矩阵合并包含最大相似性的聚类,完成高频数据共现聚类。实验结果表明:能准确检测出高频数据集内离群点数量,高频数据共现聚类执行效率快、准确性高。Generally,high frequency data have defects,such as easy to appear abnormal and out of order.Therefore,this paper studies the co-occurrence clustering algorithm of high frequency data based on local outlier detection.Firstly,high-frequency data objects in high-frequency data set were mined by local outliers of variable mesh.Secondly,abnormal high-frequency data objects were excluded,and the local outlier factor values of each high-frequency data object were arranged in descending order to obtain high-frequency data objects with larger outlier factor values,thus the execution efficiency of high-frequency data co-occurrence clustering was improved.Then,the co-occurrence similarity of high-frequency data objects was calculated to get the co-occurrence similarity matrix of high-frequency data.Finally,the clustering of high-frequency data co-occurrence was completed by merging the clusters containing the maximum similarity through the similarity matrix.Simulation results show that the algorithm can accurately detect the number of outliers in high frequency data sets,and has high efficiency and accuracy.

关 键 词:局部离群点 高频数据 共现相似度 可变网格划分 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象