一种分布式计算的空间离群点挖掘算法  被引量:3

A spatial outlier mining algorithm based on distributed computing

在线阅读下载全文

作  者:张卫平[1] 刘纪平[1] 仇阿根[1] 张用川[2] 赵阳阳[3] ZHANG Weiping LIU Jiping QIU Agen ZHANG Yongchuan ZHAO Yangyang(Chinese Academy of Surveying & Mapping, Beijing 100830, China Wuhan University, Wuhan 430079, China Liaoning Technical University, Fuxin, Liaoning 123000, China)

机构地区:[1]中国测绘科学研究院,北京100830 [2]武汉大学,武汉430079 [3]辽宁工程技术大学,辽宁阜新123000

出  处:《测绘科学》2017年第8期85-90,共6页Science of Surveying and Mapping

基  金:测绘地理信息公益性行业科研专项(201512032;201512027);中国测绘科学研究院基本科研业务费项目(7771414)

摘  要:针对现有空间离群点挖掘算法无法适应大规模空间数据挖掘的需求,该文提出了一种分布式条件下的空间离群点挖掘算法。首先,该文针对集群上分布式计算和存储的特点提出使用空间填充曲线来划分数据集,加速寻找目标点的近似空间最近邻居。其次,使用信息熵的理论来定义空间离群系数,考虑到多维数据中不同属性对离群系数的影响具有差异性,该算法能够自动根据数据原有特点,计算各属性的权重;同时使用反距离权定义空间因素对离群系数的影响。最后,实验结果表明该算法在大规模的空间数据集中挖掘离群点的效率远高于传统算法,离群点的挖掘精度在90%以上。For the existing spatial outlier mining algorithms cannot adapt to the needs of large-scale spatial data mining,a spatial outlier mining algorithm based on distributed system was presented in this paper.Firstly,the use of space filling curve to partition the data set,and speed up the nearest neighbor of the target point were proposed.Secondly,using the theory of information entropy to define the spatial outlier factor,the effect of different attributes of multidimensional data on the outliers was taken into account and the weight of each attribute according to the original features of the data was calculated automatically;at the same time,the influence of spatial factors on the outlier factor was defined by the inverse distance weight.Lastly,experimental results showed that the efficiency of this algorithm was much higher than that of the traditional algorithm,and the accuracy of outlier mining was more than ninety percent.

关 键 词:空间离群点 分布式计算 最近邻居 空间离群系数 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象