基于机器可读词典的词汇知识抽取  

LEXICAL KNOWLEDGE EXTRACTION BASED ON MACHINE READABLE DICTIONARY

在线阅读下载全文

作  者:樊玉俊[1] 胡熠[1] 陆汝占[1] 

机构地区:[1]上海交通大学计算机科学系,上海200240

出  处:《计算机应用与软件》2008年第6期8-10,共3页Computer Applications and Software

基  金:国家自然科学基金重大项目"非规范知识的基本理论和核心技术"(60496326)的支持

摘  要:越来越多的实践证明,词汇知识将是未来自然语言处理系统中不可或缺的组成部分。利用机器可读词典作为资源,首先通过对释义项进行分类,然后基于释义分析自动生成用于抽取词汇知识的模板,然后采用模板匹配的方法,实现词汇知识的自动抽取。通过一种基于最大熵模型的有监督的机器学习方法,对结果进行过滤。在应用到《应用汉语词典》中后,取得了良好的抽取效果。It has been proved by more and more practices that lexical information will be an indispensable part for natural language processing system in the future. This article introduces a method to realize the automatic extraction for lexical knowledge with the machine readable dictionary as the resource. Firstly to divide the words into groups according to their definition, then to set automatically the patterns of extraction for lexical knowledge based on the definition analysis, at last to realize the extraction by matching the patterns. The result was filtered by a supervised machine learning method based on the maximum entropy model, The method was tested on "Applied Chinese Dictionary" and turned out good extraction outcomes.

关 键 词:词汇知识 机器可读词典 模板抽取 最大熵 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] TP391.1[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象