检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]符号计算与知识工程教育部重点实验室(吉林大学)
出 处:《计算机研究与发展》2010年第6期1044-1052,共9页Journal of Computer Research and Development
基 金:国家科技支撑计划基金项目(2006BAK01A33);吉林省科技发展计划基金项目(20070321;20090704)
摘 要:聚类是数据挖掘领域的一项重要分析手段.在分析核心对象与其邻域对象的分布特征后,引入对象的投影点,对象的邻域平衡、平衡核心对象、边界稀疏对象等概念.提出一种新的基于密度的聚类算法bDBSCAN(balance-DBSCAN).算法将核心对象邻域中的对象投影,进行向量单位化,考察核心对象的邻域平衡性,将与平衡核心对象平衡密度可达的对象聚成一个簇.理论分析和实验结果表明,算法可以处理任意形状的簇,有效地排除边界稀疏对象这类噪声,并且可以解决高维数据聚类边界区分不明显、噪声对象多等问题,提高了聚类精度.算法的时间复杂度与DBSCAN近似.Clustering is an important analytical tool in data mining. Density-based clustering analysis is a clustering analysis method which is demanded to deal with very large databases. By analyzing the limitation of the existing density-based clustering algorithms and the problems of disposing various densities of data and illegibility of clusters boundaries,definitions such as projection points,neighborhood balance,balanceable core points,and boundary sparse points are introduced. After analyzing the distribution characters of core points and points in their neighborhood,a density based clustering algorithm bDBSCAN concerning the neighborhood balance of core points is proposed to improve DBSCAN. The algorithm deals with the core points by getting the projection of the points in their neighborhood to judge whether they are balanceable. Only balanceable core points can be expanded to form clusters. The algorithm can discover clusters with arbitrary shape and various data distribution characters effectively and efficiently and eliminate noise such as boundary sparse points. The theoretical analysis and experimental results indicate that the algorithm improves the accuracy of clustering and offers better results of clustering on various data sets and solves the difficulties of clustering high dimensional spatial data such as indistinct boundary between clusters,too many noise data points,etc. Meanwhile the choice and impact of the parameter in the algorithm are discussed.
关 键 词:投影点 邻域平衡 平衡核心对象 边界稀疏对象 基于密度的聚类算法
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222