基于生物医学文献的化学物质致病关系抽取  被引量:5

Chemical-Induced Disease Relation Extraction Based on Biomedical Literature

在线阅读下载全文

作  者:李智恒[1] 桂颖溢 杨志豪[1] 林鸿飞[1] 王健[1] 

机构地区:[1]大连理工大学计算机科学与技术学院,辽宁大连116024 [2]北京理工大学光电学院,北京100081

出  处:《计算机研究与发展》2018年第1期198-206,共9页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61272373;61340020;61572102;61572098);新世纪优秀人才支持计划基金项目(NCET-13-0084);中央高校基本科研业务费专项资金项目(DUT14YQ213)~~

摘  要:化学物质和疾病之间的副作用关系使得化学物质-疾病关系受到更多关注.介绍一个从生物医学文献中抽取化学物质致病关系的系统——CDRExtractor.该系统首先训练一个句子级别分类器,用于抽取存在于同一个句子中的化学物质致病(chemical-induced disease,CID)关系.在句子级别分类器训练阶段,将特征核和图核特征看作2个独立的视图,采用基于半监督的Co-training方法,利用少量人工标注的训练集和大量未标注语料训练模型.之后,CDRExtractor利用文档级别的化学物质与疾病信息特征训练一个文档级别的分类器用于实现文档级别跨句子的CID关系抽取.最后,利用规则将2个分类器的抽取结果进行整合,生成最终的输出结果.实验结果表明:CDRExtractor在BioCreative V CDR评测任务CID子任务提供的测试集上F值达到67.72%.drug reactions between chemicals and diseases make the topic of chemical-disease relations(CDRs)become a focus that receives much concern.And automatic extraction of chemical-induced disease(CID)relations from the biomedical literature can be used to support biocuration,new drug discovery and drug safety surveillance.In this paper,we present a CID relation extraction system,called CDRExtractor,to extract CID relations from biomedical literature at both sentence and document levels.To extract the CID relations located in the same sentence,we first manually annotate a sentence-level training set which is used to train the sentence-level classifier.And to improve the performances of the classifier,Co-training algorithm is used to exploit the unlabeled data with the feature kernel and graph kernel as two independent views.Then CDRExtractor uses a document-level classifier to extract the span sentence CID relations.The classifier utilizes the document level information(features)of the chemical and disease pair,and then returns the CID relations at the document level.Finally,the post-processing rules are applied to the union set of two classifiers and generate the final outputs.Experimental results show that CDRExtractor achieves an F-score of 67.72% on the test set of the BioCreative V CDR CID subtask.

关 键 词:信息抽取 文本挖掘 半监督学习 Co-training算法 化学物质-疾病关系 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象