检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广东金融学院计算机科学与技术系,广州510521
出 处:《小型微型计算机系统》2014年第1期50-54,共5页Journal of Chinese Computer Systems
基 金:广东省教育部产学研结合项目(2011B090400466)资助;广东省教育科学规划项目(2010tjk119)资助;广东金融学院校级课题项目(11XJ04-03)资助
摘 要:共享近邻(SNN)相似度能有效克服由高维和多密度等因素引起的聚类有效性问题,但计算效率不高.基于分治策略,提出一种改进的共享近邻聚类算法(DC-SNN).采用软划分策略将数据集分割为多个小规模子集,降低了计算SNN相似矩阵时需要搜索的数据点数量,同时,也避免了子集分割边界对数据点K近邻产生的不利影响.根据在子集中定义的核心数据点和扩展数据点,给出了子集中SNN相似矩阵的计算方法和合并策略,从而确保了以子集SNN相似矩阵表示整个数据集SNN相似矩阵的有效性.实验结果表明,DC-SNN算法能够在确保聚类精度不变的情况下,显著提高共享近邻聚类的效率.Shared nearest neighbor { SNN } similarity can effectively overcome the problems of cluster validity caused by the factors such as high-dimensional and multi-density, but a high computational cost is required for the SNN similarity matrix. Based on divide and conquer strategy, an improved shared nearest neighbor clustering algorithm ( DC-SNN) is proposed to address the issue. Using a soft partitioning strategy, the dataset is divided into some small subsets. Then less data points are searched during computing the SNN similarity matrix of each subset, and the adverse impact on the K nearest neighbors of data points, which is caused by the partition boundaries of the subsets, can effectively be avoided. Furthermore, according to the two terms defined in the subset, namely core data point and extended data point, both the computing method and combining strategy for SNN similarity matrix in the subset are provided to ensure that the SNN similarity matrix of dataset can effectively be expressed by those of all subsets. The experimental results show that DC-SNN algorithm can significantly improve the efficiency of the shared nearest neighbor clustering without the clustering accuracy declined.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.85