基于改进K-means的局部离群点检测方法  

Local Outlier Detection Method Based on Improved K-means

在线阅读下载全文

作  者:周玉[1] 夏浩 岳学震 王培崇 ZHOU Yu;XIA Hao;YUE Xuezhen;WANG Peichong(School of Electrical Eng.,North China Univ.of Water Resources and Electric Power,Zhengzhou 450045,China;School of Info.Eng.,Hebei Univ.of Geosciences,Shijiazhuang 050031,China)

机构地区:[1]华北水利水电大学电气工程学院,河南郑州450045 [2]河北地质大学信息工程学院,河北石家庄050031

出  处:《工程科学与技术》2024年第4期66-77,共12页Advanced Engineering Sciences

基  金:国家自然科学基金项目(U1504622,31671580);河南省高等学校青年骨干教师培养计划项目(2018GGJS079);河北省高等学校科学技术研究项目(ZD2020344)。

摘  要:离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。Objective Outliers are defined as data points generated for various special reasons.They are often regarded as noise points due to their deviation from normal data points and are considered points of research value,occupying a small proportion of the dataset.The task of outlier detection involves identifying these points and analyzing their potential abnormal information through the analysis of data attribute features.This process aims to uncover unusual patterns or behaviors within the dataset that can provide insights into unique phenomena or anomalies.Most clusteringbased outlier detection methods primarily detect outliers in the dataset from a global perspective,with weaker performance in detecting local outliers.Hence,an improved K-means clustering algorithm is proposed by introducing fast search and discovering density peak methods.A local outlier detection method,named KLOD(local outlier detection based on improved K-means and least squares methods),is developed to achieve precise detection of local outliers.Methods The K-means clustering algorithm is characterized by hard clustering,meaning that after clustering the dataset,each data point has a clear association with one cluster or another.This property makes it suitable for outlier detection,as outliers significantly affect the clustering process.However,selecting initial cluster centers and determining the number of clusters is crucial as they directly impact the clustering effectiveness.To select the accurate cluster center,clustering by fast search and finding density peaks is utilized to compute the local density and relative distance of data points,constructing a decision graph based on these metrics.The challenge lies in accurately determining the cutoff distance dc,making it difficult to precisely identify the number of cluster centers from the decision graph obtained using a single dc value.The elbow method is employed to determine the optimal number of clusters for an unknown dataset for the best clustering effectiveness to address the challeng

关 键 词:离群点检测 K均值聚类 最小二乘法 密度峰值 目标函数值 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象