类依赖特征选择算法在文本分类中的优化研究

Research on Optimization of Class-dependent Feature Selection Algorithm in Text Classification

作　　者：刘云[1] 肖雪黄荣乘 LIU Yun;XIAO Xue;HUANG Rongcheng(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500)

机构地区：[1]昆明理工大学信息工程与自动化学院,昆明650500

出　　处：《计算机与数字工程》2021年第10期2048-2051,2117,共5页Computer & Digital Engineering

基　　金：国家自然科学基金项目(编号:61262040)资助。

摘　　要：在对文本进行分类时,大量的冗余特征会增加计算复杂度并降低分类的精度,因此需要对特征进行降维。论文提出了一种类依赖(CD)特征选择算法,通过训练集计算出所有文档的关联值(DR),根据类别,分别计算出对应类的阈值(CT),依次提取出大于阈值的文档中的最大特征,得到了对应类的特征向量,以确保每个类别都有不同数量的特征。仿真结果表明,与IG-PSO和GA两种特征选择算法相比,CD特征选择算法根据类别选择特征子集,使得分类的准确率和F1指标得到提升。When classifying text,a large number of redundant features increase computational complexity and reduce the accuracy of the classification.Therefore,features need to be dimensioned.In this paper,a kind of dependency(CD)feature selection algorithm is proposed.The correlation value(DR)of all documents is calculated through the training set.The threshold(CT)of the corresponding class is calculated according to the category,and the documents larger than the threshold are extracted in turn.The largest feature,the eigenvectors of the corresponding classes are obtained to ensure that each category has a different number of features.The simulation results show that compared with IG-PSO and GA feature selection algorithms,CD feature selection algorithm selects feature subsets according to categories,which makes the classification accuracy and F1 index improve.

关键词：卡方统计量朴素贝叶斯分类器特征选择

分类号：TN929.5[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

类依赖特征选择算法在文本分类中的优化研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

类依赖特征选择算法在文本分类中的优化研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索