检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王艳娥 安健[2] 王红刚[1] 丁心安 杨倩[1] WANG Yan-e;AN Jian;WANG Hong-gang;DING Xin-an;YANG Qian(School of Science and Technology,Xi’an Siyuan University,Xi’an 710038,China;Shenzhen Research Institute of Xi’an Jiaotong University,Shenzhen 518057,China)
机构地区:[1]西安思源学院理工学院,陕西西安710038 [2]西安交通大学深圳研究院,广东深圳518057
出 处:《计算机技术与发展》2020年第7期66-70,共5页Computer Technology and Development
基 金:陕西省教育科学研究计划项目(18JK1100);深圳市科技计划项目(JCYJ20170816100939373);陕西省高等教育科学研究项目(XGH19236)。
摘 要:基于医疗数据集,研究划分式聚类算法K-medoids。针对该算法随机选取初始聚类中心、收敛速度慢、聚类结果不稳定等问题,提出基于方差的密度优化算法。该算法以样本集的均方差和距离均值为基础,再根据样本集的大小计算样本集的密度半径,在相同密度半径下稠密区域的样本具有较高的密度,通过动态选择不同高密度区域的样本作为初始聚类中心,在进行聚类的过程中通过局部优化,加快收敛速度,解决传统K-medoids存在的缺点。将该优化算法应用在UCI机器学习的医疗数据集上测试聚类效果,实验验证该算法选择的初始聚类中心位于样本集的稠密区域,更符合数据集的原始分布,且在乳腺癌数据集上具有较高的聚类准确率,聚类结果稳定,收敛速度快。Based on the medical data set,the partitioning clustering algorithm K-medoids is studied. A variance-based density optimization algorithm is proposed to solve the problems of random selection of initial clustering center,slow convergence speed and unstable clustering results in K-medoids algorithm. Based on the mean square deviation and distance mean of the sample set,the density radius of the sample set is calculated according to the size of the sample set. Samples in the dense region with the same density radius have higher density. By dynamically selecting the samples as initial clustering centers from different dense regions,local optimization is adopted in the clustering process to accelerate the convergence speed,so as to solve the shortcomings of traditional K-medoids. In order to test the clustering effect,this algorithm is applied to medical data set of UCI machine learning. The experiment shows that the initial clustering centers selected by the algorithm are located in the dense area of the sample set,which is more in line with the original distribution of the data set. The algorithm has higher clustering accuracy,more stable clustering results and faster convergence speed on breast cancer data sets.
关 键 词:医疗数据 K-medoids算法 聚类 密度优化 方差
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.131.56