基于改进灰狼优化的文本聚类多阶段特征选择算法  被引量:4

MULTI-STAGE FEATURE SELECTION ALGORITHM IN TEXT CLUSTERING BASED ON IMPROVED GREY WOLF OPTIMIZATION

在线阅读下载全文

作  者:刘泓铄 王诗瑶 周灵鸽 张建锋[1] Liu Hongshuo;Wang Shiyao;Zhou Lingge;Zhang Jianfeng(College of Information Engineering,Northwest A&F University,Yangling 712100,Shaanxi,China)

机构地区:[1]西北农林科技大学信息工程学院,陕西杨凌712100

出  处:《计算机应用与软件》2023年第3期316-324,共9页Computer Applications and Software

基  金:陕西省重点研发计划项目(2019NY-164)。

摘  要:为了降低文本特征维度,提高聚类准确度,提出改进灰狼优化多阶段特征选择与特征提取算法。结合平均绝对差和平均中位数作相关特征选择,利用合并/交叉融合特征子集;根据余弦相似性作特征提取,得到初选特征子集;基于初选特征子集,设计改进二进制灰狼优化算法(IBGWO)求解最优特征子集,利用累计词频和文档频率定义适应度,引入反向学习、非线性收敛系数衰减及精英反向学习机制,提升灰狼优化寻优性能。结果表明,该算法的聚类准确率、召回率及F1值指标优于同类算法,可以有效降低特征维度,提升聚类效率。In order to reduce the text feature dimension and improve the clustering accuracy,a multi-stage feature selection and extraction algorithm based on improved grey wolf optimization(GWO)is proposed.Combined with the average absolute deviation and average median for correlation feature selection,and merger/crossover was used to generate fusion feature subset.The feature extraction was carried on according to the cosine similarity,which could generate the primary feature subset.Based on the primary feature subset,we designed an improved binary gray wolf optimization(IBGWO)to solve optimal feature subset.We used the cumulative frequency and document frequency to define the fitness.We introduced the opposition-learning,the nonlinear convergence coefficient attenuation and elite reverse learning to improve the performance of GWO.The results show that this algorithm performs better than the comparison algorithms on clustering accuracy,recall rate and F1 value,which can effectively reduce the feature dimension and promote clustering efficiency.

关 键 词:特征选择 特征提取 二进制灰狼优化算法 反向学习 文本聚类 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象