基于MRMR和SVM的短文本分类算法改进研究  

Research on Improvement of Short Text Classification Algorithm Based on MRMR and SVM

作  者:章启超 周莲英[1] 丁腊春 ZHANG Qichao;ZHOU Lianying;DING Lachun(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang 212013;The Fourth People's Hospital of Zhenjiang City,Jiangsu Province,Zhenjiang 212001)

机构地区:[1]江苏大学计算机科学与通信工程学院,镇江212013 [2]江苏省镇江市第四人民医院,镇江212001

出  处:《计算机与数字工程》2025年第1期164-169,共6页Computer & Digital Engineering

摘  要:特征集质量和分类器性能是影响短文本分类效果的两个重要因素。具有最大特征最小冗余特点的MRMR算法是目前常用的特征降维算法,论文通过基于词分布频率的调节因子改进该算法,调节因子会在计算特征互信息值的时候降低低频特征词的权重,解决低频词语与特征标签之间高依赖的问题。之后以支持向量机为基础分类器,通过加入了变步长因子的萤火虫算法对其进行参数寻优,变步长因子的自适应性解决了萤火虫算法出现的震荡等现象,最后利用Adaboost框架迭代训练出多个不同权重的SVM基础分类器,集成得到性能更优的强分类器。论文使用网络爬虫获取的短文本数据集进行验证,以精确率(P)、召回率(R)、F1值做为评估标准,优化后的算法相比原算法在精确率上提高8%,召回率提高10%,F1值提高9%,因此实验结果表明优化后的算法具有更高的效率。The quality of feature set and the performance of classifier are two important factors that affect the effect of short text classification.MRMR algorithm with maximum feature and minimum redundancy is a commonly used feature dimensionality re⁃duction algorithm.This paper improves the algorithm by adjusting factor based on word distribution frequency.The adjusting factor will reduce the weight of low-frequency feature words when calculating the feature mutual information value,so as to solve the prob⁃lem of high dependence between low-frequency words and feature tags.Then,taking support vector machine as the basic classifier,the firefly algorithm with variable step size factor is added to optimize its parameters.The adaptability of variable step size factor solves the oscillation and other phenomena of firefly algorithm.Finally,several SVM basic classifiers with different weights are itera⁃tively trained by Adaboost framework,and a strong classifier with better performance is integrated.The paper uses the short text da⁃ta set obtained by the web crawler to verify.Taking the accuracy(P),recall(R)and F1 value as the evaluation criteria,the opti⁃mized algorithm improves the accuracy by 8%,recall by 10%and F1 value by 9%compared with the original algorithm.Therefore,the experimental results show that the optimized algorithm has higher efficiency.

关 键 词:短文本分类 特征降维 MRMR算法 支持向量机 ADABOOST 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象