动态微调的模型集成算法Bagging-DyFAS  

Bagging-DyFAS:model ensemble algorithm with dynamic fine-tuning

在线阅读下载全文

作  者:李龚林 范一晨 米宇舰 李明[1] LI Gonglin;FAN Yichen;MI Yujian;LI Ming(School of Economics and Management,Xi’an University of Technology,Xi’an Shaanxi 710054,China)

机构地区:[1]西安理工大学经济与管理学院,西安710054

出  处:《计算机应用》2023年第S02期28-33,共6页journal of Computer Applications

基  金:陕西省大学生创新创业训练计划项目(S202210700148)。

摘  要:针对单一模型用于文本分类存在的模型体量大,难以适用于舆情信息文本的多元化非规范的表达等问题,提出基于Bagging训练思想的、动态微调和二次加权的模型集成算法(Bagging-DyFAS)。首先,使用自助采样构建的数据集训练弱分类器,使该分类器具有一定的先验知识;其次,依据该分类器在开发集的表现,进行一次动态加权和一次静态加权,并使用得到的一系列权重将模型泛化到无标注的数据上,进一步提升模型在文本分类任务的性能。在所构建的数据集上的实验结果表明,在训练一轮的情况下,相较于基线模型MiniBRT、BRT3和LERT(Linguisticallymotivated bidirectional Encoder Representation from Transformer),所提算法的准确率、精确率、召回率和F1值分别至少提升3.6、3.8、1.3和3.2个百分点,实验结果验证了所提算法的有效性。In view of the problems of large model size and difficulty in applying a single model for text classification to diverse and non-normative representations of public opinion information,a model ensemble algorithm based on Bagging-Dynamic Fine-tuning And Secondary weighting(Bagging-DyFAS)was proposed.First,weak classifiers were trained with a dataset constructed by self-sampling,so that some priori knowledge was in the classifiers.Then,they were dynamically weighted once and statically weighted once based on their performance in the development set.Using the obtained series of weights,the models were generalized to unlabeled data,which could further improve the performance of the models in text classification tasks.Experimental results on the constructed test dataset show that,after training for on round,compared to the baseline models MiniBRT,BRT3 and LERT(Linguistically-motivated bidirectional Encoder Representation from Transformer),the proposed algorithm improved the accuracy,precision,recall and F1 value by at least 3.6,3.8,1.3 and 3.2 percentage points respectively,validating the effectiveness of the proposed algorithm.

关 键 词:文本分类 模型集成 二次加权 动态加权 舆情分析 预训练语言模型 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象