检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]青岛理工大学计算机工程学院,山东青岛266033
出 处:《计算机应用》2018年第1期104-109,115,共7页journal of Computer Applications
基 金:国家自然科学基金资助项目(61502262);山东省研究生教育创新计划项目(SDYY16023)~~
摘 要:针对传统K-均值算法对初始聚类中心选择较为敏感的问题,提出了一种基于融合集群度与距离均衡优化选择的K-均值聚类(K-MCD)算法。首先,基于"集群度"思想选取初始簇中心;然后,遵循所有聚类中心距离总和均衡优化的选择策略,获得最终初始簇中心;最后,对文本集进行向量化处理,并根据优化算法重新选取文本簇中心及聚类效果评价标准进行文本聚类分析。对文本数据集从准确性与稳定性两方面进行仿真实验分析,与K-均值算法相比,K-MCD算法在4个文本集上的聚类精确度分别提高了18.6、17.5、24.3与24.6个百分点;在平均进化代数方差方面,K-MCD算法比K-均值算法降低了36.99个百分点。仿真结果表明K-MCD算法能有效提高文本聚类精确度,并具有较好的稳定性。To deal with the problem that the traditional K-means algorithm is sensitive to the initial clustering center selection, an algorithm of K-Means clustering based on Clustering degree and Distance equalization optimization (K-MCD) was proposed. Firstly, the initial clustering center was selected based on the idea of "cluster degree". Secondly, the selection strategy of total clustering center distance equilibrium optimization was followed to obtain the final initial clustering center. Finally, the text set was vectorized, and the text cluster center and the evaluation criteria of text clustering were rese|ected to perform text clustering analysis according to the optimization algorithm. The analysis of simulation experiment for the text data set was carried out from the aspects of accuracy and stability. Compared with K-means algorithm, the clustering accuracy of K- MCD algorithm was improved by 18.6, 17.5, 24.3 and 24.6 percentage points respectively for four text sets; the average evolutionary algebraic variance of K-MCD algorithm was 36. 99 percentage points lower than K-means algorithm. The experimental results show that K-MCD algorithm can improve text clustering accuracy with good stability.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.109