检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:袁明锋 步中华 王强[3] Yuan Mingfeng;Bu Zhonghua;Wang Qiang(Department of Big Data and Information Industry,Chongqing Vocational College of Light Industry,Chongqing 401329,China;Qingdao Full Big Technology Co.,Ltd.,Qingdao 266580,Shandong,China;School of Information,Qingdao University of Science and Technology,Qingdao 266580,Shandong,China)
机构地区:[1]重庆轻工职业学院大数据与信息产业系,重庆401329 [2]青岛中石大科技创业有限公司,山东青岛266580 [3]青岛理工大学信息学院,山东青岛266580
出 处:《计算机应用与软件》2022年第10期274-284,306,共12页Computer Applications and Software
基 金:山东省自然科学基金项目(2018080712);教育部产学合作项目(2017HX00223)。
摘 要:为了实现特征空间降维,提高文本聚类准确性,提出一种融入混沌与对立学习的二进制粒子群优化特征选择算法。设计了新的词条权重计算方法,将文本数据表达为矢量空间模型;提出改进二进制粒子群算法求解特征选择问题,引入混沌系统和对立学习机制对粒子随机搜索方向和初始种群分布分别进行优化;在评估粒子适应度中引入词条方差和平均中位数两种方法对特征子集评估,并设计特征合并和交叉机制融合两种适应度的优势,生成最优特征子集;利用K均值算法对特征选择的文本进行聚类。结果表明,该算法在特征降维、聚类准确率、F度量值上均优于同类算法,可以有效实现特征空间降维并提升文本聚类性能。In order to achieve the feature space dimension reduction and improve the accuracy of the text clustering,a feature selection algorithm of binary particle swarm optimization based on chaos and opposite-learning is proposed.We designed a new term weight calculation method and expressed the text datasets as the vector space model.We presented an improved binary particle swarm optimization to solve the problem of feature selection,and chaotic system and opposition-learning mechanism were introduced to optimize the random search direction and initial population distribution of particles respectively.In the evaluation of particle fitness,the term variance and mean median were introduced to evaluate feature subset,and we designed characteristics of the merger and crossover mechanism combined with the advantage of two kinds of fitness to generate the optimal feature subset.The k-means algorithm was used to carried on the text clustering analysis for feature selection.The results show that our algorithm performs better than the similar algorithms on the feature dimension reduction,clustering accuracy and F measurements,which can effectively achieve feature space dimensionality reduction and promote text clustering performance.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.113.167