基于神经网络的医药科技论文实体识别与标注研究  被引量:2

Entity Recognition and Labeling for Medical Literature Based on Neural Network

在线阅读下载全文

作  者:赵蕊洁 佟昕瑀 刘小桦 路永和[1] Zhao Ruijie;Tong Xinyu;Liu Xiaohua;Lu Yonghe(School of Information Management,Sun Yat-Sen University,Guangzhou 510006,China)

机构地区:[1]中山大学信息管理学院,广州510006

出  处:《数据分析与知识发现》2022年第9期100-112,共13页Data Analysis and Knowledge Discovery

基  金:广州市科技计划基金项目(项目编号:202002020036)的研究成果之一。

摘  要:【目的】为提高医药实体识别的效果、实现医药新知识的挖掘和提高医药科技论文的利用率,提出一种新的实体识别模型。【方法】构建基于Attention-BiLSTM-CRF的医药实体识别模型,在公开数据集GENIA Term Annotation Task和BioCreative Ⅱ Gene Mention Tagging上分别对模型进行测试,进而使用该模型对生物医药论文的摘要进行实体标注。【结果】本文提出的模型优于其他基准模型,在两个数据集上的F1值分别为81.57%和84.23%、准确率分别为92.51%和97.85%,并且在数据不平衡的情况下更有优势。【局限】实体标注实验数据量和应用范围较为单一。【结论】基于Attention-BiLSTM-CRF的医药实体识别模型可以提高实体识别效果并实现医药新知识的挖掘。[Objective] This paper proposes a new entity recognition model, aiming to find new knowledge effectively and improve the utilization of medical papers. [Methods] We constructed a pharmaceutical entity recognition model based on Attention-BiLSTM-CRF and examined it on the public datasets of GENIA Term Annotation Task and BioCreative II Gene Mention Tagging. We also used the model to annotate abstracts of biomedical scientific papers. [Results] The F1 values of our model on the two data sets were 81.57% and 84.23%,while the accuracy rates were 92.51% and 97.85%. These results are better than those of the benchmark ones.Moreover, our model has more advantages in processing the extremely unbalanced data. [Limitations] The volume of data and application of entity labeling experiments are relatively homogeneous. [Conclusions] The proposed model improves the effectiveness of entity recognition and mining of new medical knowledge.

关 键 词:生物医药实体识别 实体标注 神经网络 注意力机制 

分 类 号:G350[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象