基于RoBERTa-BiLSTM-GCN-CRF的档案命名实体识别方法研究  

Research on Identification Method of Archival Naming Entities Based on RoBERTa-BiLSTM-GCN-CRF

在线阅读下载全文

作  者:赵君怡 周鹏[1] 余心杰[2] ZHAO Junyi;ZHOU Peng;YU Xinjie(College of Electrical and Information Engineering,Hubei School of Automotive Technology,Shiyan Hubei,442002;Ningbo Institute of Technology,Zhejiang University,Ningbo Zhejiang,315100)

机构地区:[1]湖北汽车工业学院电气与信息工程学院,湖北十堰442002 [2]浙大宁波理工学院,浙江宁波315100

出  处:《山西大同大学学报(自然科学版)》2025年第1期19-25,共7页Journal of Shanxi Datong University(Natural Science Edition)

基  金:宁波高新区2023年重大科技专项[2023CX050007]。

摘  要:目的 解决档案命名实体识别中的专业术语理解问题,提高数字档案管理的效率和准确性。方法 针对档案领域,提出一种基于RoBERTa-BiLSTM-GCN-CRF的命名实体识别模型。首先通过预训练模型RoBERTa使向量获得丰富的语义信息,解决档案专业术语问题,然后将包含的语义信息传送至双向长短期记忆网络(BiLSTM)模型增强模型对序列信息的理解,其次,利用图卷积神经网络(GCN)模型捕捉文本中词与词之间的复杂关系,最后利用条件随机场(CRF)模型输出实体标签。结果 收集并整理浙江省宁波市档案馆提供的低密级档案文本,经过数据预处理,形成了可用于实体识别实验的训练集、验证集和评价集数据。RoBERTa-BiLSTM-GCN-CRF模型的精确率为96.20%、召回率为95.83%、F1为96.02%,相比现有模型得到有效提升。结论 RoBERTa-BiLSTMGCN-CRF模型在档案实体识别的效果明显,有效解决档案命名实体识别中的挑战。Objective To solve the problem of terminology understanding in archives named entity recognition and improve the efficiency and accuracy of digital archives management.Methods We propose a named entity recognition model based on RoBERTa-BiLSTM-GCN-CRF for archival domain.Firstly,the pre-trained model RoBERTa was used to make the vector obtain rich semantic information to solve the problem of archival terminology.Secondly,the Graph Convolutional Neural Network(GCN)model was used to capture the complex relationship between words in the text.Finally,the Conditional Random Field(CRF)model was used to output the entity labels.Results The low-secret documents provided by Ningbo Archives in Zhejiang province were collected and sorted out.After data preprocessing,the training set,validation set,and evaluation set data that can be used for entity recognition experiments were formed.The RoBERTa-BiLSTM-GCN-CRF model has the accuracy of 96.20%,the recall rate of 95.83%,and the F1 rate of 96.02%,which are effectively improved compared with the existing models.Conclusion The RoBERTa-BiLSTM-GCN-CRF model has obvious effect on archival named entity recognition,and effectively solves the challenge of archival named entity recognition.

关 键 词:命名实体识别 深度学习 档案 预训练模型 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象