检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]桂林电子科技大学计算机与信息安全学院,广西桂林541004
出 处:《桂林电子科技大学学报》2016年第4期311-314,共4页Journal of Guilin University of Electronic Technology
基 金:国家863计划(2012AA011005)
摘 要:针对k均值算法在文本聚类中由于初始聚类质心随机选择,使得聚类结果陷入局部最优,且孤立点和不确定的聚类个数造成k均值算法准确性低、收敛速度慢的问题,提出了一种改进的k均值文本聚类算法。该算法采用fp-growth算法挖掘文本频繁项集,过滤频繁项集得到核心频繁项集,并利用核心频繁项集指导文本初始聚类质心和聚类个数的生成,最后k均值算法利用初始聚类质心和聚类个数完成文本聚类。在新浪微博数据集上进行文本聚类实验,实验结果表明,改进的k均值算法提高了文本聚类的准确性,加快了收敛速度,具有较强的鲁棒性。Random selection of initial cluster centroid in k-means algorithm for text clustering resulted in local optimization of clustering results,and isolated points and indeterminate cluster number led to low accuracy and slow convergence speed of kmeans algorithm.So an improved k-means algorithm for text clustering was proposed.In the proposed algorithm,fpgrowth algorithm was used for mining frequent item sets of text,and frequent item sets of text were filtered to obtain the core frequent item sets,and then the core frequent item sets were adopted to generate initial cluster centroid and the number of clustering.Finally k-means algorithm was applied for text clustering with the generated initial cluster centroid and the number of clustering.The results of text clustering experiment on Sina microblog dataset show that the improved k-means algorithm can effectively improve the accuracy of text clustering and accelerate the convergence speed,and has strong robustness.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229