检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨兵 聂铁铮 申德荣 寇月 于戈 YANG Bing;NIE Tie-zheng;SHEN De-rong;KOU Yue;YU Ge(School of Computer Science and Engineering,Northeastern University,Shenyang 110819,China)
机构地区:[1]东北大学计算机科学与工程学院
出 处:《小型微型计算机系统》2019年第7期1479-1485,共7页Journal of Chinese Computer Systems
基 金:国家重点研究发展计划项目(2018YFB1003404)资助;国家自然科学基金项目(61672142,61402213,U1435216)资助;中央大学基础研究基金项目(N150408001-3,N150404013)资助
摘 要:医学文本作为医疗领域重要的信息载体,为临床诊断和病理学研究提供了重要的数据支持,然而使用自然语言编写的文本数据往往是非结构化的,不便于机器理解和自动化处理.对于中文的医学文本数据而言,由于专业性强,需要丰富的领域知识,并且语法上多采用短句形式,这给结构化信息的抽取带来了巨大的挑战.为此,本文设计了一种针对医学领域的文本数据进行结构化信息抽取的方法,该方法首先通过文本聚类和关键词提取来获得医学描述语言中常用的表达术语,然后使用生成的医学术语库辅助中文分词处理,以提高中文医学文本的分词质量.然后,分析词与词之间的语义依存关系并随之构建依存句法树.最后,从该句法树中识别和抽取医学文本描述中的关键指标及其对应的指标值,最终得到结构化的键值对数据.本文采用真实的医学影像报告文本作为实验数据,实验结果表明该方法有效提高了中文医学文本的分词质量,准确率最高可达98.24%,并在结构化的信息抽取中效果显著,具有最高83.76%的准确率和88.09%的召回率.本文提出的方法能覆盖多种依存语法,且有很好的适用性.As an important information carrier in the medical field,texts provide important data which support for clinical diagnosis and pathological research.However,texts written with the natural language are often unstructured and difficult for understanding and automatic processing.Especially for medical texts in Chinese,due to its strong professionalism,which requires extensive domain knowledge,and many short sentences are used in grammar which brings more difficulties for information extraction.Therefore,this paper proposes an approach for extracting structured information from medical text data.This approach firstly uses text clustering and keywords extraction to get commonly used expression terms in medical descriptions,and then generates the medical term database to assist Chinese segmentation to improve quality of segmentation in Chinese medical texts.Then,we analyze semantic dependency between words,and construct syntactic dependency trees for identifying and extracting key indicators with the corresponding value in medical texts from these syntactic dependency trees to obtain the structured output data.We use texts data of medical image reports for experiments,and experimental results show that this approach can effectively improve the quality of Chinese word segmentation,with the accuracy up to 98.24%.Moreover,there are significant effects in structured knowledge extraction,with the most accuracy of 83.76%and recall of 88.09%.In addition,this approach can cover a variety of dependency grammar,thus has a good applicability.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229