基于空间密度的群以噪声发现聚类算法研究  被引量:19

DBSCAN: Density-based spatial clustering of applications with noise

在线阅读下载全文

作  者:毕方明[1] 王为奎[2] 陈龙[1] 

机构地区:[1]中国矿业大学计算机科学与技术学院信息安全系,徐州221116 [2]徐州空军学院,徐州221000

出  处:《南京大学学报(自然科学版)》2012年第4期491-498,共8页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金(60970032);江苏省自然科学基金(BK2007035)

摘  要:针对基于密度的群以噪声发现聚类算法(density-based spatial clustering of applications withnoise,DBSCAN)的所需内存及I/O消耗大;空间聚类的密度不均匀时,采用全局统一的变量,聚类质量较差;对于输入参数敏感性较高等三个不足进行了改进.首先根据数据的空间分布特性,将整个数据空间划分为多个较小的分区,使分区的局部密度相对更均匀;然后将每个局部分区运用改进的DBSCAN算法进行聚类,改进的算法可以根据空间数据的分布,对一个中心点自适应的选取近邻,并对这些近邻点进行取样、扩展,有效提高了算法的准确性和效率;接着将所得到的聚类结果按照合并规则进行合并.最后通过仿真实验,验证了改进的DBSCAN算法解决了内存消耗过大、聚类质量差及全局参数敏感的问题.DBSCAN (density-based spatial clustering of applications with noise) algorithm is a kind of spatial clustering algorithms based on density. This algorithm uses the concept of clustering based on density, which requires the contained objects in certain region to, be not less than a given threshold. A significant advantage of DBSCAN algorithm is its fast clustering, and it can effectively deal with noise and find the clustering space of arbitrary shape. However, this algorithm directly operates to the database and uses a global parameter to characterizing density when clustering. Thus, DBSCAN algorithm covers three obvious deficiencies, It requires large memory and I/O and owns poor quality of clustering when using unified global variables and sensitivity to input parameters. This thesis mainly improves these three deficiencies. Firstly, basing on spatial distribution characteristics, this thesis divides the whole data space into subareas to make the local density of subareas relatively more uniform. Secondly, it uses improved DBSCAN algorithm clustering algorithm on each local district. Improved DBSCAN algorithm can select neighbors adaptively according to the distribution of special data and choose samples from thisneighbors and realize extending thus improving the efficiency and accuracy of the clustering. Then it merges the clustering results regularly according to merger rules. Lastly, through simulation experiment the thesis proves that the new algorithm solves the problems such as larger memory consumption, low quality clustering and sensitive global parameters.

关 键 词:数据挖掘 空间聚类 基于密度的群以噪声发现聚类 数据分区 参数自适应 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象