基于差分进化的两阶段文本特征选择算法  被引量:6

Two-stage Text Feature Selection Algorithm Based on Differential Evolution

在线阅读下载全文

作  者:肖晓丽[1,2] 吴瑶 周锡玲 廖卓凡[1,2] XIAO Xiaoli;WU Yao;ZHOU Xiling;LIAO Zhuofan(College of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha 410114,China;Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology,Changsha 410114,China)

机构地区:[1]长沙理工大学计算机与通信工程学院,长沙410114 [2]长沙理工大学综合交通运输大数据智能处理湖南省重点实验室,长沙410114

出  处:《计算机工程》2019年第2期303-309,314,共8页Computer Engineering

基  金:国家自然科学基金(61402056)

摘  要:为降低文本特征空间维度,提高数据挖掘处理数据的效率,提出两阶段文本特征选择算法。结合方差和平均中位数2种方法构建高相关性的特征子集进行初步降维,并将其作为差分进化算法的初始特征种群。利用特征词的累计词频和文档频率设计适应度函数,将多个特征差向量和局部最优特征引入变异操作中,增加特征子集的扰动性,加快差分进化算法的收敛速度,获得最优特征子集。在WebKB和Reuters-21578数据集上进行实验,结果表明,该算法在准确率、召回率和F1值上均优于TDM5、MADAC等算法,能够降低文本特征空间的维度,提高文本聚类效果。In order to reduce the text feature space dimension and improve the efficiency of data mining processing data,a two-stage text feature selection algorithm is proposed.By combining the variance and the mean median to construct a high-correlation feature subset,the initial dimension reduction is performed as the initial feature population of the differential evolution algorithm.Then the differential evolution algorithm is improved.By using the cumulative word frequency of the feature words and the document frequency to design the fitness function,multiple feature difference vectors and local optimal features are introduced into the mutation operation,which increases the perturbation of the feature subset and accelerates the differential evolution algorithm.The convergence speed is obtained to obtain the optimal feature subset.Simulation experiments on the WebKB and Reuters-21578 datasets show that the algorithm can improve the clustering accuracy,recall rate and F1 value based on the effective reduction of the text feature space dimension.

关 键 词:混合特征选择 降维 差分进化算法 方差 平均中位数 文本聚类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象