基于深度学习的非结构化医学文本知识抽取  被引量:2

Unstructured medical text knowledge extraction based on deep learning

在线阅读下载全文

作  者:耿飙 梁成全[3] 魏炜 朱长元[1] GENG Biao;LIANG Cheng-quan;WEI Wei;ZHU Chang-yuan(College of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116,China;School of Health Management,Suzhou Vocational Health College,Suzhou 215009,China;Information Section,Huadong Sanatorium,Wuxi 214065,China;School of Computer Science,Hangzhou Dianzi University,Hangzhou 310018,China)

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]苏州卫生职业技术学院健康管理学院,江苏苏州215009 [3]华东疗养院信息科,江苏无锡214065 [4]杭州电子科技大学计算机学院,浙江杭州310018

出  处:《计算机工程与设计》2024年第1期177-186,共10页Computer Engineering and Design

基  金:中国博士后科学基金项目(2021T140707);国民核生化灾害防护国家重点实验室基金项目(SKLNBC2020-23);苏州卫生职业技术学院校级领雁培育重点基金项目(szwzy202004)。

摘  要:为解决一词多义和关系重叠问题,以糖尿病领域文本数据为对象,基于序列标注的新型标注策略,提出一种轻量级端到端神经模型。采用头部实体优先策略,使用BERT获取输入字向量,通过BiLSTM深度学习捕获时间特征和上下文相关性。引入multi_head attention机制,采用CRF模型根据相邻标签的相互依赖关系得到最优预测序列。旨在将非结构化的医学文本转换成结构化的数据,在阿里云天池中文糖尿病标注数据集上进行综合实验,实验结果表明,该模型在医学文本知识抽取中具有优越的性能。To solve the problem of one word polysemy and relationship overlap,a lightweight end-to-end neural model was proposed based on an annotation strategy based on sequence annotation for text data in the field of diabetes.The head entity priority strategy was adopted,BERT was used to obtain the input word vector,and the temporal characteristics and context correlation were captured through BiLSTM deep learning.The multi_head attention mechanism was introduced,and the CRF model was used to obtain the optimal prediction sequence according to the interdependence of adjacent tags.The purpose was to convert unstructured medical text into structured data.A comprehensive experiment was carried out on Alibaba cloud Tianchi Chinese Diabetes annotation data set.Experimental results show that the proposed model has superior performance in medical text knowledge extraction.

关 键 词:深度学习 非结构化文本 医学文本 知识抽取 实体识别 关系抽取 序列标注 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象