基于BERT-Prototypical模型的小样本科技文献分类方法  

Few-shot Scientific and Technological Literature Classification Method Based on BERT-Prototypical Model

在线阅读下载全文

作  者:白文清 崔彩霞[1] BAI Wenqing;CUI Caixia(College of Computer Science and Technology,Taiyuan Normal University,Jinzhong 030619,China)

机构地区:[1]太原师范学院计算机科学与技术学院,山西晋中030619

出  处:《软件导刊》2025年第4期42-47,共6页Software Guide

基  金:山西省基础研究计划(自由探索)项目(20210302123334)。

摘  要:由于学科的不断细化和学科间发展速度的不均衡,个别学科可用于分类训练的数据极少,为科技文献分类工作带来了一定困难。为此,针对科技文献长尾问题严重且传统文本分类方法已经无法取得更好分类效果的问题,提出一种基于BERT-Prototypical模型的小样本科技文献分类方法。该模型以迁移学习中的原型网络为基础,首先借助BERT预训练模型深入挖掘科技文献文本间的关系以获得更好的特征表示;然后将编码后的文本特征输入到原型网络中,通过优化原型网络的编码方式和参数设置提高科技文献分类效果。实验结果表明,在5-way 20-shot任务中,BERT-Prototypical模型的分类准确率达到95.6%;在样本有限的5-way 5-shot任务中,BERT-Prototypical模型的分类准确率可达78.4%,相较对照模型的分类效果有所提升。Due to the continuous refinement of disciplines and the uneven development speed between disciplines,there is very little data avail‐able for classification training in individual disciplines,which brings certain difficulties to the classification of scientific literature.To address the serious problem of long tail in scientific literature and the inability of traditional text classification methods to achieve better classification results,a small sample scientific literature classification method based on BERT Prototypal model is proposed.This model is based on the prototype net‐work in transfer learning,and first uses the BERT pre trained model to deeply explore the relationships between scientific literature texts to obtain better feature representations;Then input the encoded text features into the prototype network,and improve the classification performance of sci‐entific literature by optimizing the encoding method and parameter settings of the prototype network.The experimental results show that in the 5-way 20 shot task,the classification accuracy of the BERT Prototypal model reaches 95.6%;In the 5-way 5-shot task with limited samples,the classification accuracy of the BERT Prototypal model can reach 78.4%,which is improved compared to the control model.

关 键 词:科技文献分类 小样本学习 原型网络 BERT模型 不平衡数据 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象