基于增量预训练与对抗学习的古籍命名实体识别  

Ancient named entity recognition based on incremental pretraining and adversarial learning

在线阅读下载全文

作  者:任乐 张仰森[1] 李剑龙 孙圆明 刘帅康 REN Le;ZHANG Yang-sen;LI Jian-long;SUN Yuan-ming;LIU Shuai-kang(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China;School of Computer Science,Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]北京信息科技大学智能信息处理研究所,北京100101 [2]北京信息科技大学计算机学院,北京100101

出  处:《计算机工程与设计》2025年第4期1190-1197,共8页Computer Engineering and Design

基  金:国家自然科学基金项目(62176023)。

摘  要:针对用于古籍命名实体识别古籍语料少、古文信息熵高的问题,构建基于二十四史的古籍文本语料库,并提出一种基于增量预训练和对抗学习的古籍命名实体识别模型(ANER-IPAL)。基于自建的古籍文本数据集,使用NEZHA-TCN模型进行预训练,在嵌入层融合对抗学习增强模型泛化能力,在解码层引入全局指针网络,将实体识别任务建模为子串提取任务,结合规则进行结果的矫正输出。实验结果表明,所提模型在“古籍命名实体识别2023”数据集(GuNER2023)上的F1值达到了95.34%,相较于基线模型NEZHA-GP提高了4.19%。Aiming at the problem that the shortage of corpus resources for ancient text dataset,and the high entropy of ancient texts,a corpus of ancient Chinese dataset based on Twenty-Four Histories was constructed,and a model for ancient named entity recognition based on incremental pretraining and adversarial learning(ANER-IPAL)was proposed.Based on the self-built ancient text dataset,the NEZHA-TCN architecture was used to pre-train.To enhance the generalization capability,an adversa-rial learning method was fused at the embedding layer.At the decoding layer,the global pointer network was introduced to model the entity recognition task as a subsequence extraction task.Rules were combined for correcting and outputting the results.Experimental results on the Ancient Text Named Entity Recognition 2023 dataset(GuNER2023)show that the proposed model achieves an F1 score of 95.34%,improving 4.19%compared to that of the baseline model NEZHA-GP.

关 键 词:二十四史 古籍命名实体识别 增量预训练 时序卷积神经网络 对抗学习 全局指针 子串提取 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象