中文医学细粒度知识表示体系与标注语料库构建  被引量:1

Fine-grained Chinese Medical Knowledge:A Representation System and an Annotated Corpus

在线阅读下载全文

作  者:杨洋[1] 关毅[1] 李雪[1] 姜京池 史怀璋[2] 柳曦光[3] YANG Yang;GUAN Yi;LI Xue;JIANG Jingchi;SHI Huaizhang;LIU Xiguang(Department of Computer Science,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China;Department of Neurosurgery,First Hospital of Harbin Medical University,Harbin,Heilongjiang 150030,China;Department of Dermatology,HcilongjiangProvincial Hospital,Harbin,Heilongjiang 150030,China)

机构地区:[1]哈尔滨工业大学计算学部,黑龙江哈尔滨150001 [2]哈尔滨医科大学附属第一医院神经外科,黑龙江哈尔滨150030 [3]黑龙江省医院皮肤科,黑龙江哈尔滨150030

出  处:《中文信息学报》2023年第6期52-66,共15页Journal of Chinese Information Processing

基  金:国家自然科学基金(62006063);黑龙江省博士后科学基金(LBH-Z20015)。

摘  要:面向医学知识的细粒度、可共享性与高精准性的需求,该文提出了中文医学文本知识表示体系,融合了电子病历、医学书籍与专业医学网站文本三个数据来源的医疗知识。该体系包括9类医学实体、60类实体关系。基于此,开发了可操作性高的标注工具,并为每种来源提供了规范标注的医学文本数据,构建了涵盖范围广、一致性高的细粒度标注语料库。4名临床医生对《诊断学》书籍标注了6526个医学实体,4229条关系,标注一致性可达0.974。三个数据源融合后实体数量344475个,关系数量3196787条。该文综述了数据源融合的映射过程、标注细则,分析了各数据源的文本特点并总结标注模式,通过应用场景与文本特点表明医学书籍标注必要性。该文为中文医学语料库构建提供标注规范,并为中文医学实体识别与关系抽取提供语料支持。To build a fine-grained,sharable,and high-quality knowledge base in the medical field,we propose a Chinese medical knowledge representation system to cover Chinese clinical texts including electronic medical records,books,and professional medical web text data.This system defines 9 entity types and 60 entity relation types.Then we develop a highly operable annotation tool and construct a public available annotated corpus with wide coverage and high consistency for all three text sources.Four annotators annotate the Chinese medical book named“Diagnostics”with 0.974 inter-annotator agreement,generating altogether 6526 medical entities and 4229 entity relations.The whole corpus consists of 344475 medical entities and 3196787 entity relations without duplications.The paper presents the mapping scheme,annotation rules for knowledge fusion,as well as the text characteristics of each data source.As a pioneering work for Chinese corpus of medical entity recognition and relation extraction,this paper provides an annotation standard for Chinese medical construction.

关 键 词:细粒度标注规范 多源医疗文本 语义标注 语料库构建 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象