基于期望核密度离群因子的离群点检测算法  被引量:3

Outlier detection algorithm based on expected kernel density outlier factor

在线阅读下载全文

作  者:张忠平[1,2] 孙光旭 姚春辰 刘硕 齐文旭[3] ZHANG Zhongping;SUN Guangxu;YAO Chunchen;LIU Shuo;QI Wenxu(School of Information Science and Engineering,Yanshan University,Qinhuangdao 066004;Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Yanshan University,Qinhuangdao 066004;School of Information Systems Engineering,Information Engineering University,Zhengzhou 450001)

机构地区:[1]燕山大学信息科学与工程学院,秦皇岛066004 [2]河北省计算机虚拟技术与系统集成重点实验室,秦皇岛066004 [3]信息工程大学信息系统工程学院,郑州450001

出  处:《高技术通讯》2024年第2期187-198,共12页Chinese High Technology Letters

基  金:国家自然科学基金(61972334);河北省创新能力提升计划(222567626H);中央引导地方科技发展资金项目(226Z1707G);四达铁路智能图像工件识别基金(No.x2021134);秦皇岛城发健康产业发展有限公司绩效考核管理系统(x2022247)资助项目。

摘  要:针对基于密度的离群点检测方法在不同分布的数据集上检测精度低的问题,提出了一种基于期望核密度离群因子的离群点检测算法。首先,引入k近邻和反向k近邻扩展邻域空间(ENS)代替传统的k邻域范围,更加全面地考虑数据对象的邻域信息;其次,在传统核密度估计(KDE)方法的基础上引入多元高斯函数,在扩展邻域空间内估计数据对象的密度,同时借鉴自适应核带宽的思想,更好地适应不同数据集的数据分布;然后,给出期望距离的概念,进一步区分局部离群点和位于低密度区域的正常点;最后,定义了期望核密度离群因子刻画数据对象离群程度。在人工数据集和真实数据集上对所提算法进行实验验证,并与部分传统算法进行对比,验证了所提算法的有效性。For the problem that density-based outlier detection method has low detection accuracy on different distributed data sets,an outlier detection algorithm based on expected kernel density outlier factor is proposed.Firstly,the knearest neighbor and reverse k-nearest neighbor extended neighborhood space are introduced instead of the traditional k-neighborhood range,and the neighborhood information of data objects is considered more comprehensively.Then,the multivariate Gaussian function is introduced on the basis of the traditional kernel density estimation(KDE) method to estimate the density of data objects in the extended neighborhood space,and the idea of adaptive kernel bandwidth is introduced to better adapt to the data distribution of different datasets.In addition,the concept of expected distance is proposed to further distinguish between local outliers and normal points located in low-density regions.Finally,the expected kernel density outlier factor characterizes the degree of outlier of the data object.The proposed algorithm is experimentally verified on artificial datasets and real datasets,and compared with some traditional algorithms to prove the effectiveness of the proposed algorithm.

关 键 词:数据挖掘 离群点 核密度估计(KDE) 期望距离 期望核密度离群因子 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象