基于邻域密度的异构数据局部离群点挖掘算法被引量：7

Local Outlier Mining Algorithm for Heterogeneous Data Based on Neighborhood Density

作　　者：王晓辉[1] 宋学坤[1] 王晓川[2] WANG Xiao-hui;SONG Xue-kun;WANG Xiao-chuan(Henan University of Chinese Medicine,Zhengzhou Henan 450046,China;Zhengzhou University,Zhengzhou Henan 450001,China)

机构地区：[1]河南中医药大学,河南郑州450046 [2]郑州大学,河南郑州450001

出　　处：《计算机仿真》2021年第7期281-285,共5页Computer Simulation

基　　金：国家自然基金青年项目(61702164,81703946);河南省科技攻关计划项目(172102310535);河南省高等学校青年骨干教师培养计划(2020GGJS104)。

摘　　要：由于数据集规模、维数,以及复杂程度的不断提高,导致对其离群点的挖掘难度越来越大,提出了基于邻域密度的局部离群点挖掘算法。首先依据节点计算性能对高维数据进行区域分割,通过各个维度的数据分布来评价区域分割的效果。然后采取核密度来描述局部密度,根据高斯分布得到数据出现次数,进一步计算出数据邻域密度。再由邻域及密度关系计算得到各数据离群度,从而判断异构数据中的离群点。最后针对可能存在的离群误判情况,采取离群分数计算,为增强此过程的检测性能,利用权重进行剪枝处理。人工与UCI数据集上的仿真结果表明,当数据量和数据维数改变时,算法对离群点挖掘的准确度几乎不受影响,挖掘时间和覆盖率指标也显著优于其它方法;同时对于不同类型和复杂度的异构数据,算法仍然保持良好的挖掘准确度和效率。As the increasing of the size, dimension and complexity of data sets, it is more and more difficult to mine outliers. Therefore, a local outlier mining algorithm based on neighborhood density is proposed. Firstly, the high-dimensional data was segmented according to the node computing performance, and the effect of region segmentation was evaluated by the data distribution of each dimension. Then the kernel density was used to describe the local density, and the occurrence times of the data were obtained according to the Gaussian distribution, and the data neighborhood density was further calculated. Then the outlier degree of each data was calculated by neighborhood and density relationship, so as to judge the outlier in heterogeneous data. Finally, in view of the possible outlier misjudgment, the outlier score was calculated. In order to enhance the detection performance of this process, pruning was processed by weight. Simulation results on the datasets of artificial and UCI show that, when the amount of data and the dimension of data change, the accuracy of outlier mining is hardly affected, and mining time and coverage index are also significantly better than other methods;At the same time, for different types and complexity of heterogeneous data, the algorithm still maintains good accuracy and efficiency.

关键词：离群点挖掘区域分割邻域密度异构数据离群分数

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于邻域密度的异构数据局部离群点挖掘算法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于邻域密度的异构数据局部离群点挖掘算法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于邻域密度的异构数据局部离群点挖掘算法被引量：7