检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗舒文 万仁霞 苗夺谦[3] LUO Shuwen;WAN Renxia;MIAO Duoqian(General Education Center Quanzhou University of Information Engineering,Quanzhou 362000,China;College of Mathematics and Information Science,North Minzu University,Yinchuan 750021,China;College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China)
机构地区:[1]泉州信息工程学院通识教育中心,福建泉州362000 [2]北方民族大学数学与信息科学学院,宁夏银川750021 [3]同济大学电子与信息工程学院,上海201804
出 处:《山西大学学报(自然科学版)》2024年第1期30-39,共10页Journal of Shanxi University(Natural Science Edition)
基 金:国家自然科学基金(61662001);中央高校基本科研业务费专项资金(FWNX04);宁夏自然科学基金(2021AAC03203)。
摘 要:本文针对密度峰值聚类算法(CFSFDP)无法自动选取簇中心的不确定性问题,通过引入三支决策理论对其进行优化,提出了一种基于簇中心预选策略的三支决策密度峰值聚类算法(TDPC)。首先利用密度和距离两参数的统计特性将数据对象划分核心域、边界域与琐碎域,符合条件的聚类中心被置于核心域,难以判定的疑似聚类中心点则被置于边界域,然后通过定义的k-可达域和判别准则对疑似聚类中心进行分析,选取出实际聚类中心。所提出算法有效解决了密度峰值聚类算法聚类中心自动确定问题。在2个人工数据集和4个UCI(University of California,lrvine)公共数据集上对TDPC进行测试。与CFSFDP算法和DBSCAN(Density-Based Spatial Clustering of Applications with Noise)算法进行聚类性能比较,所提出算法TDPC在轮廓系数、DB(Davies-Bouldin)指数、调整互信息、调整兰德系数、FM(Fowlkes-Mallows)指数、同质性、完整性等聚类评价指标方面均达到最优或与最优算法结果相近,表明TDPC综合聚类性能优于比较算法,具有良好的聚类可行性与有效性。Aiming at the uncertainty problem that the CFSFDP(clustering by fast search and find of density peaks)algorithm cannot automatically select the clustering center,in this paper,we propose a three-way decision-based density peak clustering algorithm with clustering centers preselection(TDPC)by incorporating the three-way decision theory.Firstly,the statistical characteristics of density and distance are used to divide the data objects into core region,boundary region and trivial region.The qualified cluster cen-ters are assigned to the core region,and the suspected cluster centers that are difficult to determine are placed in the boundary region.Then the defined k-reachable region and discriminant criterion are used to analyze the suspected cluster centers,and the actual clus-ter centers are subsequently selected.The proposed algorithm can effectively solve the problem of automatic determination of cluster centers in density peak clustering algorithm.The proposed algorithm is evaluated on two synthetic datasets and four UCI(University of California,Irvine)public datasets.Comparing to the CFSFDP algorithm and the DBSCAN(Density-Based Spatial Clustering of Applications with Noise)algorithm,TDPC demonstrated clustering performance that is on par with or superior to the optimal algorithm across various clustering evaluation indexes,including silhouette coefficient,DB(Davies-Bouldin)index,adjusted mutual information,adjusted rand index,FM(Fowlkes-Mallows)index,homogeneity,and completeness.These results indicate that TDPC outperforms the comparison algorithms in terms of comprehensive clustering performance,and highlight its good clustering feasibilityandeffectiveness.
关 键 词:聚类算法 聚类中心 边界域 三支聚类 密度聚类 k-可达域
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.158