检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王玉荣 林民[2,3,4,6] 胡其吐 白双成[3,4] 包龙杰 WANG Yurong;LIN Min;HU Qitu;BAI Shuangcheng;BAO Longjie(School of Mathematical Sciences,Inner Mongolia Normal University,Hohhot 010022,China;Inner Mongolia Autonomous Region Applied Mathematics Centre,Hohhot 010022,China;Key Laboratory of Infinite-dimensional Hamiltonian Systems and Algorithmic Applications of the Ministry of Education,Hohhot 010022,China;School of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,China;Inner Mongolia International Mongolian Medical Hospital,Hohhot 010022,China;Inner Mongolia Electronic Information Vocational and Technical College,Hohhot 010060,China)
机构地区:[1]内蒙古师范大学数学科学学院,呼和浩特010022 [2]内蒙古自治区应用数学中心,呼和浩特010022 [3]无穷维哈密顿系统及其算法应用教育部重点实验室,呼和浩特010022 [4]内蒙古师范大学计算机科学与技术学院,呼和浩特010022 [5]内蒙古国际蒙医医院,呼和浩特010021 [6]内蒙古电子信息职业技术学院,呼和浩特010060
出 处:《中央民族大学学报(自然科学版)》2025年第1期64-71,共8页Journal of Minzu University of China(Natural Sciences Edition)
基 金:国家自然科学基金项目(62266033);无穷维哈密顿系统及其算法应用教育部重点实验室开放课题资助项目(2023KFZD03);内蒙古师范大学研究生科研创新基金资助项目(CXJJB23011);内蒙古自治区自然科学基金项目(2023ZD10)。
摘 要:命名实体识别(Name Entity Recognition,NER)是信息抽取的一项基本任务,其目的是从非结构文本中识别出预定义类型的实体。由于蒙古语属于低资源语言且蒙医临床标注数据的缺乏,使得蒙医电子病历命名实体识别效果比较差。近年来,提示学习在小样本上的表现异常优异,有效地建立起了预训练模型与下游任务之间的桥梁。本文针对蒙医小样本场景,首先构建了蒙医电子病历命名实体识别数据集MDNER。MD⁃NER参考中文电子病历NER数据集定义了5大实体。标注的实体个数为:症状体征3829个、检查检验1489个、疾病诊断2071个、药物934个、治疗3734个。然后,提出基于机器阅读理解(Machine Reading Comprehen⁃sion,MRC)和全指针网络(Global Pointer,GP)的蒙医电子病历命名实体识别框架MRC⁃GP。将传统的序列标记任务转换为基于提示学习的片段抽取任务,减少标签类型、降低模型学习难度,从而降低模型对数据的依赖,同时解决实体嵌套问题。实验证明,与传统命名实体识别方法相比,MRC-GP方法对于低资源数据更有效。Named Entity Recognition(NER)is a basic task in information extraction,which aims to identify predefined types of entities from unstructured text.The Mongolian language is a low⁃re⁃source and the lack of clinical annotation data in Mongolian medicine makes the recognition of named entities in Mongolian electronic medical records relatively ineffective.In recent years,prompt learning has performed exceptionally well on few samples,effectively building a bridge between pre⁃trained models and downstream tasks.In this paper,for the few⁃sample scenario of Mongolian medi⁃cine,we first constructed the Mongolian medical electronic medical record named entity recognition dataset MDNER.MDNER defines five major entities with reference to the Chinese electronic medical record NER dataset.The number of labelled entities are:3829 symptoms and signs,1489 examina⁃tion tests,2071 disease diagnoses,934 drugs and 3734 treatments.Then,MRC⁃GP,a framework for recognising named entities in Mongolian medical electronic medical records based on Machine Reading Comprehension(MRC)and Global Pointer(GP),is proposed.the traditional sequence labelling task is converted into span extraction task based on prompt learning,which reduces labels,lower the model learning difficulty,thus reducing the model's dependence on data,while solving the entity nesting problem.Experiments demonstrate that our method is more effective for low⁃re⁃source data than traditional named entity recognition methods.
关 键 词:蒙医 机器阅读理解 提示学习 命名实体识别 构建数据集
分 类 号:TN391[电子电信—物理电子学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15