检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周玉[1] 夏浩 裴泽宣 ZHOU Yu;XIA Hao;PEI Zexuan(School of Electrical Engineering,North China University of Water Resources and Electric Power,Zhengzhou 450045,China)
机构地区:[1]华北水利水电大学电气工程学院,郑州450045
出 处:《哈尔滨工业大学学报》2024年第8期68-85,共18页Journal of Harbin Institute of Technology
基 金:国家自然科学基金(U1504622,31671580);河南省高等学校青年骨干教师培养计划项目(2018GGJS079)。
摘 要:为解决全局离群点检测方法无法对局部离群点进行检测,以及局部异常因子在面对大量局部离群点时性能下降的问题,利用k近邻(KNN)和核密度估计方法(KDE)提出一种基于改进快速搜索和发现密度峰值聚类算法(KDPC)的离群点检测与解释方法,该方法能够同时对数据点的全局和局部进行分析。首先,利用k近邻和核密度估计方法计算数据点的局部密度,代替传统DPC算法中根据截断距离计算的局部密度。其次,将数据点的k近邻距离之和作为全局异常值,并通过KDPC聚类算法计算簇密度以及数据点的局部异常值。最后,将数据点的全局与局部异常值进行乘积作为最终异常得分,选取异常得分最高的Top-n作为离群点,通过构建全局-局部异常值决策图对全局和局部离群点进行解释。利用人工数据集和UCI数据集进行实验并与10种常用离群点检测方法进行比较。结果表明,该方法对全局和局部离群点都有着较高的检测精度和检测性能,并且AUC方面受k值影响较小。同时,利用该方法对NBA球员数据进行分析讨论,进一步证明了该方法的实用性和有效性。To address the limitatios of global outlier detection methods in detecting local outliers and the performance degradation of local anomaly factors in the presence of a large number of local outliers,this paper proposes an outlier detection and interpretation method based on an improved fast search and discovery density peak clustering algorithm(KDPC),utilizing k-nearest neighbor(KNN)and kernel density estimation(KDE)methods.This method enables simultaneous analysis of both global and local data points.Firstly,the local density of data points is calculated using the k-nearest neighbor and kernel density estimation methods instead of the local density based on the truncation distance in the traditional DPC algorithm.Secondly,the sum of the k-nearest neighbor distances of the data points is used as the global outlier and the cluster density as well as the local outliers of the data points are calculated by the KDPC clustering algorithm.Finally,the global and local outliers of the data points are multiplied as the final anomaly score.The Top-n data points with the highest anomaly score is selected as the outlier,and the global and local outliers are interpreted by constructing a global-local outlier decision diagram.Experiments were conducted using both artificial and UCI datasets and our method was compared with 10 commonly used outlier detection methods.The results show that our method achieves high detection accuracy and performance for both global and local outliers.Moreover,the AUC performance is minimally affected by the k-value.Additionally,our method is also used to analyze NBA player data,further demonstrating its practicality and effectiveness.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49