检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张锦宏 陈梅[1] 张弛 ZHANG Jinhong;CHEN Mei;ZHANG Chi(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
机构地区:[1]兰州交通大学电子与信息工程学院,兰州730070
出 处:《计算机科学与探索》2023年第12期2880-2895,共16页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金(62266029);甘肃省重点研发计划(21YF5GA053);甘肃省高等学校产业支撑计划项目(2022CYZC-36)。
摘 要:针对现有聚类算法识别任意簇时精度不足、对簇内数据点密度变化敏感、对异常点敏感以及阈值取值难以确定等问题,提出了自适应阈值约束的密度簇主干聚类算法(DCBAT)。该算法首先结合偏度系数和数据点密度均值定义了数据点密度可达自适应阈值,在该阈值的约束下将具有较高局部密度和较高相对距离的核心点按密度可达性分组,进而得到初始簇主干。接着将非核心数据点归并到其密度较大的最近邻所在簇中,得到初始簇。最后结合簇内密度差均值和比例系数定义了密度差自适应阈值,在该阈值的约束下于簇内点密度变化剧烈处拆分初始簇,得到最终簇。DCBAT在充分考虑数据分布特点和内部结构特点的情况下进行聚类,从而提高了聚类性能。与五个优秀算法k-means、DBSCAN、OPTICS、CFDP和MulSim在八个不同维度、不同类型的数据集上的实验结果表明,DCBAT算法具有识别任意簇效果佳、对簇内点密度变化不敏感、对异常点不敏感、聚类结果精确且稳定等特点,综合性能优于对比算法。The existing clustering algorithms are inaccurate to identify arbitrary clusters,sensitive to density changes within clusters,sensitive to outliers and difficult to determine the threshold.An adaptive threshold-constrained density cluster backbone clustering algorithm(DCBAT)is proposed to solve the problems.Firstly,the adaptive reachability density threshold is defined in combination with the skewness coefficient and points density mean.Under the constraint of the threshold,the core points with higher local densities and higher relative distances are grouped according to the reachability,and the initial clusters backbones are obtained.The non-core points are then assigned into the cluster which their nearest neighbors with higher density belong to.Finally,the adaptive density Dvalue threshold is proposed in combination with D-value mean and scale factor.According to the threshold,the initial cluster is separated at the point where the density varies sharply,and the final clusters are obtained.DCBAT fully considers the internal structure and distribution of the data when clustering,thereby improving the clustering performance.The performance of this algorithm is demonstrated compared with five excellent algorithms k-means,DBSCAN,OPTICS,CFDP and MulSim on eight datasets with various dimensions and types.DCBAT algorithm has the advantages of good recognition of arbitrary clusters,insensitivity to density changes within clusters,insensitivity to outliers and stable clustering result.Its overall performance is superior to comparison algorithms.
关 键 词:聚类 簇主干 密度可达自适应阈值 密度差自适应阈值 任意簇
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.47.108