面向大数据集的共享近邻聚类研究被引量：5

Research on Shared Nearest Neighbor Clustering for Large Dataset

出　　处：《小型微型计算机系统》2014年第1期50-54,共5页Journal of Chinese Computer Systems

基　　金：广东省教育部产学研结合项目(2011B090400466)资助;广东省教育科学规划项目(2010tjk119)资助;广东金融学院校级课题项目(11XJ04-03)资助

摘　　要：共享近邻(SNN)相似度能有效克服由高维和多密度等因素引起的聚类有效性问题,但计算效率不高.基于分治策略,提出一种改进的共享近邻聚类算法(DC-SNN).采用软划分策略将数据集分割为多个小规模子集,降低了计算SNN相似矩阵时需要搜索的数据点数量,同时,也避免了子集分割边界对数据点K近邻产生的不利影响.根据在子集中定义的核心数据点和扩展数据点,给出了子集中SNN相似矩阵的计算方法和合并策略,从而确保了以子集SNN相似矩阵表示整个数据集SNN相似矩阵的有效性.实验结果表明,DC-SNN算法能够在确保聚类精度不变的情况下,显著提高共享近邻聚类的效率.Shared nearest neighbor { SNN } similarity can effectively overcome the problems of cluster validity caused by the factors such as high-dimensional and multi-density, but a high computational cost is required for the SNN similarity matrix. Based on divide and conquer strategy, an improved shared nearest neighbor clustering algorithm （ DC-SNN） is proposed to address the issue. Using a soft partitioning strategy, the dataset is divided into some small subsets. Then less data points are searched during computing the SNN similarity matrix of each subset, and the adverse impact on the K nearest neighbors of data points, which is caused by the partition boundaries of the subsets, can effectively be avoided. Furthermore, according to the two terms defined in the subset, namely core data point and extended data point, both the computing method and combining strategy for SNN similarity matrix in the subset are provided to ensure that the SNN similarity matrix of dataset can effectively be expressed by those of all subsets. The experimental results show that DC-SNN algorithm can significantly improve the efficiency of the shared nearest neighbor clustering without the clustering accuracy declined.

关键词：共享近邻分治法大数据集聚类分析

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向大数据集的共享近邻聚类研究被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向大数据集的共享近邻聚类研究 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向大数据集的共享近邻聚类研究被引量：5