检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机应用与软件》2008年第6期8-10,共3页Computer Applications and Software
基 金:国家自然科学基金重大项目"非规范知识的基本理论和核心技术"(60496326)的支持
摘 要:越来越多的实践证明,词汇知识将是未来自然语言处理系统中不可或缺的组成部分。利用机器可读词典作为资源,首先通过对释义项进行分类,然后基于释义分析自动生成用于抽取词汇知识的模板,然后采用模板匹配的方法,实现词汇知识的自动抽取。通过一种基于最大熵模型的有监督的机器学习方法,对结果进行过滤。在应用到《应用汉语词典》中后,取得了良好的抽取效果。It has been proved by more and more practices that lexical information will be an indispensable part for natural language processing system in the future. This article introduces a method to realize the automatic extraction for lexical knowledge with the machine readable dictionary as the resource. Firstly to divide the words into groups according to their definition, then to set automatically the patterns of extraction for lexical knowledge based on the definition analysis, at last to realize the extraction by matching the patterns. The result was filtered by a supervised machine learning method based on the maximum entropy model, The method was tested on "Applied Chinese Dictionary" and turned out good extraction outcomes.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] TP391.1[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145