一种处理严重不均衡数据的BERT-BiGRU-WCELoss短文本警情分类模型  

A BERT-BIGRU-WCELOSS CLASSIFICATION MODEL FOR HANDING SEVERELY UNBALANCED SHORT ALERT TEXT DATA

在线阅读下载全文

作  者:刘冬 翁海光 陈一民 Liu Dong;Weng Haiguang;Chen Yimin(Shanghai Police College,Shanghai 200137,China;Shanghai Jian Qiao University,Shanghai 201306,China)

机构地区:[1]上海公安学院,上海200137 [2]上海建桥学院,上海201306

出  处:《计算机应用与软件》2024年第9期217-223,229,共8页Computer Applications and Software

基  金:上海公安学院科研项目(23xkx53)。

摘  要:针对110报警类警情文本数据存在着文本长度极短且样本类别分布严重不均衡的问题,提出一种BERT-BiGRU-WCELoss警情分类模型。该模型通过中文预训练BERT(Bidirectional Encoder Representations from Transformers)模型抽取文本的语义;使用BiGRU(Bidirectional Gated Recurrent Unit)综合提炼文本的语义特征;通过优化自适应权重损失函数WCELoss(Weight Cross Entropy Loss function)给少数类样本赋予更大的损失权重。实验结果表明:该模型在某市2015年某一自然月的110报警数据集上取得了95.83%的分类准确率,精准率、召回率、F1值和G_mean均高于传统深度学习模型和交叉熵损失训练的模型。In response to the problem of extremely short text length and severely imbalanced distribution of sample categories in 110 alarm text data,this paper proposes a BERT-BiGRU-WCELoss alarm classification model.The model extracted the semantics of the text through the Chinese pre trained BERT(Bidirectional Encoder Representations from Transformers)model.BiGRU(Bidirectional Gated Recurrent Unit)was used to comprehensively extract the semantic features of the text.By optimizing the adaptive weight loss function WCELoss(Weight Cross Entropy Loss function),larger loss weights were assigned to minority class samples.The experimental results show that the model achieved a classification accuracy of 95.83%on the 110 alarm dataset of a certain natural month in 2015 in a certain city,with higher accuracy,recall rate,F1 value,and G_Mean than traditional deep learning models and models trained with cross entropy loss.

关 键 词:BERT BiGRU 警情分类 非均衡数据 短文本 样本加权 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象