检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈德华[1] 刘茜茜[1] 乐嘉锦[1] 潘乔[1] 朱立峰[2]
机构地区:[1]东华大学计算机科学与技术学院,上海201620 [2]上海交通大学医学院附属瑞金医院计算机中心,上海201620
出 处:《计算机与现代化》2016年第4期1-6,共6页Computer and Modernization
基 金:上海市科委科技创新行动计划资助项目(15511106900)
摘 要:目前医疗文本数据的结构化处理大多依赖通用分词工具或医学知识库,而通用分词工具对专业术语的识别效果并不理想,且国内的中文医学术语标准化进程不足。针对此问题,提出一种基于统计信息对镜检文本数据进行结构化处理的方法。该方法以聚类文本为基础,基于断点词与重合串分词,利用分词词串的统计信息获取关键词以及词语类别信息,并进行词语扩充,从而得到最终词库作为字典。利用基于字典的双向最大匹配分词算法,对文本数据进行分词,并通过添加否定检出的规则,获取结构化数据。实验结果表明,该方法获取的医学词库的准确率达到了80%,实现了不依赖分词工具获得结构化数据的功能。The current structured approaches for the medical text data are mostly dependent on universal word segmentation software or professional terminology libraries,but the recognition effect of professional vocabularies by universal word segmentation tools is not satisfactory,and a mature system of Chinese standard terminology library is not established. Aimed at these problems,this paper puts forward a kind of structured processing method for medical text data based on statistical information. On the basis of clustering text and according to the breakpoint words and coincident string word segmentation,the key words and the type information of words are obtained by the statistical information of participle word string,enlarged the words and got the final lexicon as the word dictionary. It carried out word segmentation by the two-way dictionary word maximum matching algorithm and then obtained structured data by adding the rules of negative detection. Experiments show that the accuracy of the professional vocabulary libraries obtained by this method reached 80%,and this method achieves the capability to get structured data without the help of segmentation tools.
关 键 词:医疗文本数据 文本数据结构化 统计 分词 双向最大匹配
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147