检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王琛[1] 董永权[2] WANG Chen;DONG Yong-quan(School of Information and Electronics Engineering,Jiangsu Vocational Institute of Architectural Technology,Xuzhou 221116,China;School of Computer Science and Technology,Jiangsu Normal University,Xuzhou 221116,China)
机构地区:[1]江苏建筑职业技术学院信电工程学院,江苏徐州221116 [2]江苏师范大学计算机科学与技术学院,江苏徐州221116
出 处:《计算机工程与设计》2021年第9期2526-2535,共10页Computer Engineering and Design
基 金:国家自然科学基金项目(61100167);江苏省自然科学基金项目(BK2011204)。
摘 要:提出基于二进制灰狼优化的特征选择与文本聚类算法。为得到最佳聚类结果,将文本数据表达为矢量空间模型;利用二进制灰狼优化算法对文本特征进行选择,得到初选特征子集;对前一阶段中不同特征相关分值计算方法得到的初选特征子集进行合并与交叉操作,进一步计算最优特征子集;在新特征子集基础上,利用同步考虑余弦相似度和欧氏距离指标的多目标优化K均值算法完成文本聚类,得到最优文本聚类解。实验结果表明,在多数数据集上,该算法可以有效降低特征维度,聚类指标表现更好。A feature selection based on binary grey wolf optimization and text clustering algorithm was proposed.To obtain the best clustering outcome,the text data were expressed as the vector space model.Binary grey wolf optimization algorithm was used to select text features and get the primary feature subset.Union and intersection operation were implemented on the primary feature subsets in the previous stage employing different feature relevance scores methods for further obtaining an optimal feature subset.On the basis of a new feature subset,designed multi-objective optimization k-means algorithm synchronously considering the cosine similarity and Euclidean distance index was used for text clustering,and the final text clustering solution was obtained.Results show that,in the majority of the data set,the proposed algorithm can effectively reduce the feature dimension,and perform better on the clustering indicators.
关 键 词:文本聚类 二进制灰狼算法 K均值聚类 特征选择 选择合并 词条权重
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.100