An alert-situation text data augmentation method based on MLM  

在线阅读下载全文

作  者:DING Weijie MAO Tingyun CHEN Lili ZHOU Mingwei YUAN Ying HU Wentao 丁伟杰(Department of Computer and Information Security,Zhejiang Police College,Hangzhou 310053,P.R.China;Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security,Hangzhou 310053,P.R.China)

机构地区:[1]Department of Computer and Information Security,Zhejiang Police College,Hangzhou 310053,P.R.China [2]Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security,Hangzhou 310053,P.R.China [3]Zhejiang Dahua Technology Co.,Ltd,Hangzhou 310053,P.R.China

出  处:《High Technology Letters》2024年第4期389-396,共8页高技术通讯(英文版)

基  金:Supported by the Humanities and Social Sciences Research Project of the Ministry of Education(No.22YJA840004).

摘  要:The performance of deep learning models is heavily reliant on the quality and quantity of train-ing data.Insufficient training data will lead to overfitting.However,in the task of alert-situation text classification,it is usually difficult to obtain a large amount of training data.This paper proposes a text data augmentation method based on masked language model(MLM),aiming to enhance the generalization capability of deep learning models by expanding the training data.The method em-ploys a Mask strategy to randomly conceal words in the text,effectively leveraging contextual infor-mation to predict and replace masked words based on MLM,thereby generating new training data.Three Mask strategies of character level,word level and N-gram are designed,and the performance of each Mask strategy under different Mask ratios is analyzed and studied.The experimental results show that the performance of the word-level Mask strategy is better than the traditional data augmen-tation method.

关 键 词:deep learning text data augmentation masked language model(MLM) alert-sit-uation text classification 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] D631[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象