基于MacBERT与全局指针网络的中文电子病历命名实体识别  

Named Entity Recognition for Chinese Electronic MedicalRecords Using MacBERT and Global Pointer Network

在线阅读下载全文

作  者:吴天宇 郭冬冬 李文桥 李子康 苗琳 WU Tian-yu;GUO Dong-dong;LI Wen-qiao;LI Zi-kang;MIAO Lin(Computer School,Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]北京信息科技大学计算机学院,北京100101

出  处:《科学技术与工程》2025年第11期4656-4665,共10页Science Technology and Engineering

基  金:国家重点研发计划(2021YFB2600600);北京信息科技大学校级科研项目(2023XJJ15,2023XJJ17)。

摘  要:针对现有序列标注方法不能有效解决中文电子病历嵌套实体识别问题,提出一种基于MacBERT与全局指针网络的中文电子病历命名实体识别模型。首先通过MacBERT-large预训练模型将文本转换为结合语境信息的动态向量,然后使用FGM (fast gradient method)方法生成对抗样本添加至原有向量并一同输入BiLSTM (bi-directional long short-term memory)网络获取上下文特征,并通过引入注意力机制增强长距离语义特征获取,最后利用全局指针网络模型同时考虑头部和尾部的特征信息进行解码以获得更好的医学嵌套实体预测效果。实验结果表明,本文模型相较于识别效果较好的主流模型全局指针网络模型在CCKS2019以及两个版本的CMeEE中文电子病历数据集上F1分别提高了1.8%、1.37%、1.72%,证明了模型的有效性。Addressing the limitation of existing sequence labeling approaches in effectively recognizing nested entities within Chinese electronic health records(EHRs),a novel named entity recognition model that integrates MacBERT and a global pointer network was proposed.Initially,the MacBERT-large pre-trained model transformed the text into context-sensitive dynamic vectors.Subsequently,the fast gradient method(FGM)was employed to generate adversarial samples,which were incorporated into the original vectors and fed into a BiLSTM(bi-directional long short-term memory)network to capture contextual features.To enhance the capture of long-distance semantic features,an attention mechanism was introduced.Finally,a global pointer network model was leveraged to decode simultaneously considering both head and tail feature information,thereby achieving superior prediction performance for medical nested entities.Experimental results demonstrate that compared to the state-of-the-art global pointer model,the proposed model achieves an improvement of 1.8%,1.37%,and 1.72%in F 1-score on the CCKS2019 dataset and two versions of the CMeEE Chinese EHR dataset,respectively,validating the effectiveness of the proposed approach.

关 键 词:命名实体识别 中文电子病历 全局指针网络 注意力机制 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象