局部离群点挖掘算法研究  被引量:96

Study on Algorithms for Local Outlier Detection

在线阅读下载全文

作  者:薛安荣[1] 鞠时光[1] 何伟华[1] 陈伟鹤[1] 

机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013

出  处:《计算机学报》2007年第8期1455-1463,共9页Chinese Journal of Computers

基  金:国家自然科学基金(60603041);江苏省高校自然科学基金(05KJB520017);江苏省自然科学基金(BK2006073)的资助~~

摘  要:离群点可分为全局离群点和局部离群点.在很多情况下,局部离群点的挖掘比全局离群点的挖掘更有意义.现有的基于局部离群度的离群点挖掘算法存在检测精度依赖于用户给定的参数、计算复杂度高等局限.文中提出将对象属性分为固有属性和环境属性,用环境属性确定对象邻域、固有属性计算离群度的方法克服上述局限;并以空间数据为例,将空间属性与非空间属性分开,用空间属性确定空间邻域,用非空间属性计算空间离群度,设计了空间离群点挖掘算法.实验结果表明,所提算法具有对用户依赖性少、检测精度高、可伸缩性强和运算效率高的优点.Outlier detection has attracted much attention recently. There are two kinds of outliers. global outliers and local outliers. In many scenarios, the detection of local outliers is more valuable than that of global outliers. To mine local outliers, it is more meaningful to assign to each object a degree of being an outlier. Some existing representative algorithms currently used for solving this problem are compared in detail, and their disadvantages are pointed out such as poor efficiency and the detection accuracy depending on the parameters given by the user. In general, the attributes of each data object can be categorized as the inherent attributes and the context attributes, the inherent attributes characterize the data object while the context attributes embody the relationship between this data object and the neighbor data objects. The context attributes is not intrinsic to the data object. In order to overcome those disadvantages mentioned above, this paper proposes to use the context attributes to determine the object neighborhood and use the inherent attributes to compute the outlier score. For spatial data, the attributes comprise the non-spatial dimensions and the spatial dimensions. The spatial attributes provide a location index to the data object. The neighborhood in the Euclidean space plays a very important role in the analysis of spatial data. The spatial attributes are used to determine spatial neighborhood and the non-spatial dimensions are used to compute the spatial outlier score. This paper also proposes a novel measure, spatial local outlier factor (SLOF), which captures the local behavior of datum in its spatial neighborhood. The experimental results show that proposed SLOF algorithm outperforms the other existing algorithms in detection accuracy, user dependency, scalability and efficiency.

关 键 词:离群点检测 局部离群系数 R^*-树 数据挖掘 空间离群点 剔除平均 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象