检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴天宇 郭冬冬 李文桥 李子康 苗琳 WU Tian-yu;GUO Dong-dong;LI Wen-qiao;LI Zi-kang;MIAO Lin(Computer School,Beijing Information Science and Technology University,Beijing 100101,China)
机构地区:[1]北京信息科技大学计算机学院,北京100101
出 处:《科学技术与工程》2025年第11期4656-4665,共10页Science Technology and Engineering
基 金:国家重点研发计划(2021YFB2600600);北京信息科技大学校级科研项目(2023XJJ15,2023XJJ17)。
摘 要:针对现有序列标注方法不能有效解决中文电子病历嵌套实体识别问题,提出一种基于MacBERT与全局指针网络的中文电子病历命名实体识别模型。首先通过MacBERT-large预训练模型将文本转换为结合语境信息的动态向量,然后使用FGM (fast gradient method)方法生成对抗样本添加至原有向量并一同输入BiLSTM (bi-directional long short-term memory)网络获取上下文特征,并通过引入注意力机制增强长距离语义特征获取,最后利用全局指针网络模型同时考虑头部和尾部的特征信息进行解码以获得更好的医学嵌套实体预测效果。实验结果表明,本文模型相较于识别效果较好的主流模型全局指针网络模型在CCKS2019以及两个版本的CMeEE中文电子病历数据集上F1分别提高了1.8%、1.37%、1.72%,证明了模型的有效性。Addressing the limitation of existing sequence labeling approaches in effectively recognizing nested entities within Chinese electronic health records(EHRs),a novel named entity recognition model that integrates MacBERT and a global pointer network was proposed.Initially,the MacBERT-large pre-trained model transformed the text into context-sensitive dynamic vectors.Subsequently,the fast gradient method(FGM)was employed to generate adversarial samples,which were incorporated into the original vectors and fed into a BiLSTM(bi-directional long short-term memory)network to capture contextual features.To enhance the capture of long-distance semantic features,an attention mechanism was introduced.Finally,a global pointer network model was leveraged to decode simultaneously considering both head and tail feature information,thereby achieving superior prediction performance for medical nested entities.Experimental results demonstrate that compared to the state-of-the-art global pointer model,the proposed model achieves an improvement of 1.8%,1.37%,and 1.72%in F 1-score on the CCKS2019 dataset and two versions of the CMeEE Chinese EHR dataset,respectively,validating the effectiveness of the proposed approach.
关 键 词:命名实体识别 中文电子病历 全局指针网络 注意力机制
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7