基于高阶差分和网格划分算法的DBSCAN参数自动选取算法  被引量:8

DBSCAN parameter setting based on higher-order difference and grid partition algorithm

在线阅读下载全文

作  者:兰红[1] 朱合隆 Lan Hong;Zhu Helong(School of Information Engineering,Jiangxi University of Science&Technology,Ganzhou Jiangxi 341000,China)

机构地区:[1]江西理工大学信息工程学院,江西赣州341000

出  处:《计算机应用研究》2020年第11期3347-3352,共6页Application Research of Computers

基  金:国家自然科学基金资助项目(61762046);江西省自然科学基金资助项目(20161BAB212048)。

摘  要:针对DBSCAN算法中的两个参数eps和minPts通常依靠经验选取所带来的不足,提出一种高阶差分和网格划分相结合的快速DBSCAN自动参数选取算法。首先分析数据集中数据点与参数的关系,通过引入高阶差分算法自动获取eps和minPts两个参数;然后利用网格划分对数据集建立网格索引,优化算法的运行效率,最后针对噪声点过多的数据集提出去极化操作,增强算法的鲁棒性。算法应用于flame等九个数据集,分别与传统DBSCAN和AGD-DBSCAN算法选取的参数进行聚类效果和算法运行效率的对比分析。结果表明提出的基于高阶差分自动选取参数算法是一种有效的DBSCAN参数自动选取方法,网格划分显著提升了高阶差分算法的性能,去极化操作必要且有效,具有很好的实用性。In DBSCAN algorithm two parameters,eps and minPts,their values usually depend on experiences.This paper proposed a fast DBSCAN automatic parameter selection algorithm based on high order difference and grid partition(HDOG-DBSCAN algorithm)to find a new way to automatically set the two parameters.Firstly,this paper analyzed the relationship between data points and parameters in each dataset,and used the high-order difference algorithm to get the two parameters.Se-condly,in order to improve efficiency of the algorithm,it introduced the grid division,and optimized the algorithm through establishing the grid index of dataset.Thirdly,aiming at some data sets with too many noise points,this paper proposed eliminate extremes operation to enhance the robustness of the algorithm.It applied the HDOG-DBSCAN algorithm to flame and other nine datasets.Comparing with the parameters that selected by the traditional DBSCAN algorithm and the AGD-DBSCAN algorithm,this paper compared and analyzed the clustering effects and the running efficiency of the algorithm.The results show that HDOG-DBSCAN algorithm is an effective DBSCAN parameter automatic selection method.What’s more,the grid division significantly improves the performance of the high-order difference algorithm,and the eliminate extremes operation is necessary and effective.

关 键 词:密度聚类 参数选取 高阶差分 网格划分 去极化 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象