检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蔡晓琼 郑增亮 苏前敏[1] 郭晶磊 CAI Xiaoqiong;ZHENG Zengliang;SU Qianmin;GUO Jinglei(College of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China;School of Basic Medical Sciences,Shanghai University of Traditional Chinese Medicine,Shanghai 201203,China)
机构地区:[1]上海工程技术大学电子电气工程学院,上海201620 [2]上海中医药大学基础医学院,上海201203
出 处:《智能计算机与应用》2023年第1期164-170,177,共8页Intelligent Computer and Applications
基 金:“十三五”国家科技重大专项(2018ZX09711001-009-001);上海市2017年度科技创新行动计划(17401970900)。
摘 要:随着生物医学研究与信息化技术的迅速发展,临床医学文献数量呈指数级增长,利用文本挖掘技术自动提取医学知识逐渐成为当前研究热点。针对目前新型冠状病毒肺炎(Corona Virus Disease 2019,COVID-19)临床文本研究匮乏、语料不足与标注质量不高等问题,本文结合UMLS医学语义网络和专家定义方式,制定医学实体标注规则,建立命名实体识别语料库,明确实体识别任务。其次,提出了一种基于MPNet与BiLSTM的COVID-19临床文本命名实体识别模型。通过预训练语言模型获得文本的向量化表示,解决了一词多义问题;采用双向长短期记忆网络,捕捉文本的长距离依赖;最后引入条件随机场,实现句子级序列注释,输出完整的最优标签序列。实验结果表明,MPNet-BiLSTM-CRF模型在COVID-19临床命名实体识别数据集上取得了较好的表现。With the rapid development of biomedical research and information technology, the amount of clinical medical literature is growing exponentially, and the automatic extraction of medical knowledge using text mining technology is gradually becoming a current research hotspot. T In view of the current lack of research on Corona Virus Disease 2019(COVID-19) clinical texts, insufficient corpus, and low quality of labeling, this paper formulates medical entity labeling rules based on the UMLS medical semantic network and expert definition methods, establishes a named entity recognition corpus, and clarifies the entity recognition task. Secondly, a COVID-19 clinical text named entity recognition model based on MPNet and BiLSTM is proposed to obtain a vectorized representation of the text by pre-training the language model to solve the problem of multiple meanings of a word;a bidirectional long and short-term memory network is used in order to capture the long-range dependency of this paper;finally, a conditional random field is introduced to achieve sentence-level sequence annotation and output a complete sequence of optimal labels. The experimental results show that the MPNet-BiLSTM-CRF model achieves better performance on the COVID-19 clinical named entity identification dataset.
关 键 词:COVID-19 命名实体识别 双向长短期记忆网络 条件随机场
分 类 号:TP391[自动化与计算机技术—计算机应用技术] R319[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.190.176.26