中文医学文献的实体关系提取研究及在糖尿病医学文献中的应用  被引量:6

Research on entity relationship extraction of Chinese medical literature and application in diabetes medical literature

在线阅读下载全文

作  者:范智渊 何璇[1,2] 梁品 吕晶 康雁[3] FAN Zhiyuan;HE Xuan;LIANGPin;LU Jing;KANG Yan(College of Medicine and Biological Information Engineering,Northeastern University,Shenyang 110819,P.R.China;Neusoft Research of Intelligent Healthcare Technology,Co.Ltd.,Shenyang 110819,P.R.China;College of Health Science and Environmental Engineering,Shenzhen Technology University,Shenzhen,Guangdong 518118,P.R.China)

机构地区:[1]东北大学医学与生物信息工程学院,沈阳110819 [2]沈阳东软智能医疗科技研究院有限公司,沈阳110819 [3]深圳技术大学健康与环境工程学院,广东深圳518118

出  处:《生物医学工程学杂志》2021年第3期563-573,共11页Journal of Biomedical Engineering

基  金:国家自然科学基金青年项目(61806048);沈阳东软智能医疗科技研究院有限公司开放课题(NRIHTOP1802)。

摘  要:医学文献含有丰富的有价值的医学知识。目前,在医学文献上的实体关系提取研究已经得到了很大的进步,但是随着医学文献数量以指数形式增加,医学文本的标注工作成为一个很大的问题。为解决人工标注耗时长、工作量大的问题,研究者提出了远程监督标注的方法,但这种方法会引入大量噪声。本文提出了一种基于卷积神经网络的新型神经网络结构,可以解决大量噪声问题。该模型可以利用多窗口卷积神经网络自动提取句子特征,在得到句子向量后,通过注意力机制选择对真实关系有效的句子。特别地,提出实体类型(ET)嵌入方法,通过加入实体类型特征用于关系分类。我们针对训练文本存在不可避免的标注错误问题,提出句子级别的注意力机制用于关系提取。使用968份糖尿病医学文献进行实验,结果表明,与基线模型相比,本文模型在医学文献中得到了较好的效果,F1分数达到93.15%。最后,我们将提取的11类关系以三元组的形式存储,并利用这些三元组制成具有33347个节点、43686条关系边的复杂关系医学知识图谱。实验结果证明,本文所使用的算法明显优于用于关系提取的最佳基准系统。The medical literature contains a wealth of valuable medical knowledge.At present,the research on extraction of entity relationship in medical literature has made great progress,but with the exponential increase in the number of medical literature,the annotation of medical text has become a big problem.In order to solve the problem of manual annotation time such as consuming and heavy workload,a remote monitoring annotation method is proposed,but this method will introduce a lot of noise.In this paper,a novel neural network structure based on convolutional neural network is proposed,which can solve a large number of noise problems.The model can use the multi-window convolutional neural network to automatically extract sentence features.After the sentence vectors are obtained,the sentences that are effective to the real relationship are selected through the attention mechanism.In particular,an entity type(ET)embedding method is proposed for relationship classification by adding entity type characteristics.The attention mechanism at sentence level is proposed for relation extraction in allusion to the unavoidable labeling errors in training texts.We conducted an experiment using 968 medical references on diabetes,and the results showed that compared with the baseline model,the present model achieved good results in the medical literature,and F1-score reached 93.15%.Finally,the extracted 11 types of relationships were stored as triples,and these triples were used to create a medical map of complex relationships with 33347 nodes and 43686 relationship edges.Experimental results show that the algorithm used in this paper is superior to the optimal reference system for relationship extraction.

关 键 词:医学文献 实体关系提取 卷积神经网络 知识图谱 糖尿病 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] R-05[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象