基于全息熵的空间离群点挖掘算法研究  被引量:4

Spatial outlier detection based on holographic entropy

在线阅读下载全文

作  者:薛安荣[1] 何峰[1] 闻丹丹[1] 

机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013

出  处:《计算机应用研究》2014年第2期369-372,397,共5页Application Research of Computers

基  金:国家自然科学基金资助项目(61300228);高校博士点基金资助项目(20093227110005)

摘  要:基于距离和基于密度的离群点检测算法受到维度和数据量伸缩性的挑战,而空间数据的自相关性和异质性决定了以属性相互独立和分类属性的基于信息理论的离群点检测算法也难以适应空间离群点检测,因此提出了基于全息熵的混合属性空间离群点检测算法。算法利用区域标志属性进行区域划分,在区域内利用空间关系确定空间邻域,并用R*-树进行检索。在此基础上提出了基于全息熵的空间离群度的度量方法和空间离群点挖掘算法,有效解决了混合属性的离群度的度量和离群点的挖掘问题。由于实现区域划分有利于并行计算,从而可适应大数据量的计算。理论和实验证明,所提算法在计算效率和实验结果的可解释性方面均具有优势。The outlier detection algorithms based on distance and density are faced with the challenges of both the dimensions and the amount of data scalability, and the autocorrelation and heterogeneity of spatial data determines that outlier detection al- gorithm which is characterized by attribute independent of each other and categorical attributes based on information theory is difficult to adapt to the spatial outlier detection. Hence, this paper proposed a spatial outlier detection algorithm based on mixed attributes of holographic entropy. The algorithm partitioned the region by regional identity property, determined the spa- tial neighborhood using spatial relationships in the region and then retrieved it by R* -tree. On this basis, it proposed spatial outlier degree based on holographic entropy and spatial outlier mining algorithm; it solved the outlier degree of the mixed at-- tributes and the problems of outliers mining effectively. It could adapt to the large volume of data calculation because partitio- ning the region was conducive to parallel computing. Theoretical and experimental results show that the algorithm proposed has advantage in terms of the computational efficiency and the interpretative aspects.

关 键 词:全息熵 R*-树 空间离群点 离群点检测 混合属性 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论] TP301.6[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象