基于医学领域知识和远程监督的医学实体关系抽取研究  被引量:5

Extracting Medical Entity Relationships with Domain-Specific Knowledge and Distant Supervision

在线阅读下载全文

作  者:景慎旗[1,2,3] 赵又霖 Jing Shenqi;Zhao Youlin(School of Information Management,Nanjing University,Nanjing 210023,China;School of Biomedical Engineering and Informatics,Nanjing Medical University,Nanjing 211166,China;Center for Data Management,The First Affiliated Hospital of Nanjing Medical University(Jiangsu Province Hospital),Nanjing 210096,China)

机构地区:[1]南京大学信息管理学院,南京210023 [2]南京医科大学生物医学工程与信息学院,南京211166 [3]南京医科大学第一附属医院(江苏省人民医院)数据应用管理中心,南京210096

出  处:《数据分析与知识发现》2022年第6期105-114,共10页Data Analysis and Knowledge Discovery

基  金:国家重点研发计划项目(项目编号:2018YFC1314900);江苏省重点研发计划项目(项目编号:BE2020721)的研究成果之一。

摘  要:【目的】针对当前传统医学关系抽取方法存在数据标注成本高及易产生错误标签的问题,提出引入医学领域知识的远程监督医学实体关系抽取模型。【方法】该模型采用多实例策略降低远程监督标注数据的噪声影响,使用预训练语言模型MedicalBERT对远程监督标注文本进行编码,以实体在医学知识库的描述作为背景知识为医学关系抽取提供监督信号,提升文本中实体语义编码的准确性。【结果】本文模型的抽取效果与现有模型相比,准确率最高提升5.4%,召回率最高提升2.5%,F1值最高提升4.1%。此外,在并发症的抽取结果中,F1值达到93.8%。【局限】模型主要适用于句子级关系抽取,暂未考虑其在更多句子情况下的性能。【结论】引入医学领域知识的远程监督医学实体关系抽取模型具有良好的关系抽取效果,可为医学关系抽取研究提供参考。[Objective] This paper proposes a distant supervised model to extract medical entity relationships based on Medical Domain-Specific Knowledge, aiming to reduce the cost of data labeling and potential errors of the existing models. [Methods] First, we used a multi-instance strategy to reduce the noise of distant supervised labeled data. Then, we utilized a pre-trained language model(MedicalBERT) to encode the labeled texts. Third,with the description of the entities in the medical knowledge base, we provided supervision signals for medical relationship extraction, and improved the accuracy of the semantic encoding. [Results] Compared with the existing models, performance of our new algorithm was up to 5.4% higher for Precision, 2.5% higher for Recall,and 4.1% higher for F1. In addition, F1-score for the complicated extraction tasks reached 93.8%. [Limitations]More research is needed to examine the proposed method with more sentences. [Conclusions] Our new model could effectively extract medical entity relationships and benefit related research.

关 键 词:医学关系抽取 远程监督 医学领域知识 预训练语言模型 

分 类 号:G302[文化科学] R-02[医药卫生]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象