检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:章启超 周莲英[1] 丁腊春 ZHANG Qichao;ZHOU Lianying;DING Lachun(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang 212013;The Fourth People's Hospital of Zhenjiang City,Jiangsu Province,Zhenjiang 212001)
机构地区:[1]江苏大学计算机科学与通信工程学院,镇江212013 [2]江苏省镇江市第四人民医院,镇江212001
出 处:《计算机与数字工程》2025年第1期164-169,共6页Computer & Digital Engineering
摘 要:特征集质量和分类器性能是影响短文本分类效果的两个重要因素。具有最大特征最小冗余特点的MRMR算法是目前常用的特征降维算法,论文通过基于词分布频率的调节因子改进该算法,调节因子会在计算特征互信息值的时候降低低频特征词的权重,解决低频词语与特征标签之间高依赖的问题。之后以支持向量机为基础分类器,通过加入了变步长因子的萤火虫算法对其进行参数寻优,变步长因子的自适应性解决了萤火虫算法出现的震荡等现象,最后利用Adaboost框架迭代训练出多个不同权重的SVM基础分类器,集成得到性能更优的强分类器。论文使用网络爬虫获取的短文本数据集进行验证,以精确率(P)、召回率(R)、F1值做为评估标准,优化后的算法相比原算法在精确率上提高8%,召回率提高10%,F1值提高9%,因此实验结果表明优化后的算法具有更高的效率。The quality of feature set and the performance of classifier are two important factors that affect the effect of short text classification.MRMR algorithm with maximum feature and minimum redundancy is a commonly used feature dimensionality re⁃duction algorithm.This paper improves the algorithm by adjusting factor based on word distribution frequency.The adjusting factor will reduce the weight of low-frequency feature words when calculating the feature mutual information value,so as to solve the prob⁃lem of high dependence between low-frequency words and feature tags.Then,taking support vector machine as the basic classifier,the firefly algorithm with variable step size factor is added to optimize its parameters.The adaptability of variable step size factor solves the oscillation and other phenomena of firefly algorithm.Finally,several SVM basic classifiers with different weights are itera⁃tively trained by Adaboost framework,and a strong classifier with better performance is integrated.The paper uses the short text da⁃ta set obtained by the web crawler to verify.Taking the accuracy(P),recall(R)and F1 value as the evaluation criteria,the opti⁃mized algorithm improves the accuracy by 8%,recall by 10%and F1 value by 9%compared with the original algorithm.Therefore,the experimental results show that the optimized algorithm has higher efficiency.
关 键 词:短文本分类 特征降维 MRMR算法 支持向量机 ADABOOST
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.178.45