结合权重因子和特征向量改进的混合聚类方法  被引量:2

HYBRID CLUSTERING METHOD IMPROVED BY COMBINING WEIGHTING FACTOR AND FEATURE VECTOR

在线阅读下载全文

作  者:董跃华[1] 郭士串 

机构地区:[1]江西理工大学信息工程学院,江西赣州341000

出  处:《计算机应用与软件》2015年第11期264-268,共5页Computer Applications and Software

基  金:江西省研究生创新专项资金项目(YC2013-S198)

摘  要:针对特征词权重表示文本时存在的局限性和遗传K-均值算子操作的低效性,首先通过特征词权重因子(WF)和特征向量结合位置权重信息的方法进行文本预处理,在此基础上通过遗传控制因子(GCF)改进遗传K-均值文本聚类算法。在个体进行交叉和变异时,使用GCF对其进行控制,并对交叉和变异概率采用自适应控制,确保了优质个体顺利进入到下一代种群。实验表明,该研究不仅对特征词分类及其权重的有效计算作出改进,还使文本聚类精度得到提高。When using feature word weight to express the text, there are the limitation and the inefficiency in operation of genetic k-means operator. In order to solve the problems, in the paper we first preprocess the text through the method of combing the weight factor (WF) and feature vector of feature words with the information of position weight. On this basis we improve the genetic k-means text clustering algorithm using genetic control factor (GCF). GCF is used to control the individuals in their crossover and mutation operation too, and carries out the adaptive control for crossover and mutation probabilities, thus makes sure the individuals with high qualities will smoothly get into the population of next generation. Experiment shows that the research can improve the effective calculation of the classification and weights of feature words, and enhances the text clustering accuracy as well.

关 键 词:文本聚类 权重因子 特征向量 遗传控制因子 遗传K-均值 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象