一种改进的随机森林Boost多标签文本分类算法被引量：4

AN IMPROVED RANDOM FOREST BOOST MULTI-LABEL TEXT CLASSIFICATION ALGORITHM

作　　者：邵孟良齐德昱[2] Shao Mengliang;Qi Deyu(Department of Computer Science,Software Engineering Insitute of Guangzhou,Guangzhou 510990,Guangdong,China;School of Computer Science and Technology,South China University of Technology,Guangzhou 510006,Guangdong,China)

机构地区：[1]广州软件学院计算机系,广东广州510990 [2]华南理工大学计算机科学与工程学院,广东广州510006

出　　处：《计算机应用与软件》2022年第11期215-221,303,共8页Computer Applications and Software

基　　金：国家自然科学基金项目(61070015);广东省前沿与关键技术创新项目(2014B010110004);广东省普通高校重点项目(自然)(2019GZDXM020);广州软件学院校级科研团队建设项目(ST202002)。

摘　　要：针对目前Boosting算法计算成本高、学习时间长的问题,提出一种改进的随机森林提升(RF-Boost)算法(IRF-Boost)。对训练特征进行排序;在每个Boosting轮中,过滤并使用排序靠前特征的较小子集;根据权重选择一个特征构建新的弱假设,弱假设搜索空间的大小从k降低至1。实验检验并分析了信息增益、卡方、GSS系数、互信息、优势比、F1得分和准确度共7种特征排序方法。实验结果表明:在所评价的特征排序法中,互信息最适用于RF-Boost;IRF-Boost的效率优于RF-Boost及AdaBost.MH,即IRF-Boost是解决实践应用和专家系统中分类问题的较好选择。The current boosting algorithm has the problem of high computational cost and long learning time,therefore we propose an improved RF-Boost algorithm(IRF-Boost).We sorted the training features,and filtered and used the smaller subsets of the top features in each boosting round.A feature was selected according to the weight to build a new weak hypothesis,and the size of the weak hypothesis search space was reduced from k to 1.Seven feature ranking methods(information gain,chi square,GSS coefficient,mutual information,advantage ratio,F1 score and accuracy)were tested and analyzed.The experimental results show that,mutual information is most suitable for RF-Boost,and the efficiency of IRF-Boost algorithm is better than that of RF-Boost and AdaBost.MH,which means IRF-Boost is a better choice to solve classification problems in practical applications and expert systems.

关键词：BOOSTING算法特征排序多标签学习文本分类弱假设

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的随机森林Boost多标签文本分类算法被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的随机森林Boost多标签文本分类算法 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种改进的随机森林Boost多标签文本分类算法被引量：4