融合TF-IDF算法和预训练模型的文本数据增强被引量：2

Textual Data Augmentation Blending TF-IDF and Pre-Trained Model

作　　者：胡荣笙车文刚[1] 张龙戴庞达 HU Rong-sheng;CHE Wen-gang;ZHANG Long;DAI Pang-da(School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650500,China)

机构地区：[1]昆明理工大学信息工程与自动化学院,云南昆明650500

出　　处：《计算机仿真》2024年第5期495-500,共6页Computer Simulation

基　　金：国家自然科学基金(62102395);安徽省科技重大专项(202003a05020020)。

摘　　要：针对自然语言处理领域的数据增强问题,首次提出了一种基于TF-IDF算法和预训练语言模型BERT融合的文本数据增强方法。首先,改进传统的基于随机策略的词元选择方法,避免对语义起到关键作用的词元进行改写,利用TF-IDF算法提取样本的非核心词,得到替换的目标词元;之后针对现有算法在生成新数据时,依赖输入样本而导致的增强样本多样化受限问题,融合BERT模型预测目标替换词元,并使用预测的结果替换目标词元。实验结果表明,基于TF-IDF和BERT预训练模型融合的文本数据增强算法有效提升深度学习模型的性能达5.8%,优于现有的文本数据增强算法。To improve the performance of textual data augmentation(TDA)in the field of natural language processing,a novel TDA algorithm is proposed by blending the TF-IDF algorithm and the BERT pre-trained language model.First,different from the traditional random selection strategy of the token selection method,the proposed method uses the TF-IDF algorithm to extract the most uninformative words into tokens and avoids rewriting tokens that play a key role in semantics.Then,since most existing data augmentation methods depend on input samples,leading to the limited diversification of augmented samples,the pre-trained language model BERT is blended into the proposed method to predict the token and replace the tokens with the predicted results.Experimental results demonstrate that the proposed TDA algorithm efficiently improves the performance of the deep learning models by 5.8%,and the proposed method is superior to the existing TDA algorithms.

关键词：自然语言处理深度学习文本数据增强预训练语言模型

分类号：TP391.9[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合TF-IDF算法和预训练模型的文本数据增强被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合TF-IDF算法和预训练模型的文本数据增强 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

融合TF-IDF算法和预训练模型的文本数据增强被引量：2