基于词典属性特征的粗粒度词义消歧  被引量:10

Coarse-Grained Word Sense Disambiguation Using Features Described in the Lexicon

在线阅读下载全文

作  者:吴云芳[1] 金澎[1] 郭涛[1] 

机构地区:[1]北京大学计算语言学研究所,北京100871

出  处:《中文信息学报》2007年第2期3-8,共6页Journal of Chinese Information Processing

基  金:国家973计划资助项目(2004CB318102)

摘  要:本文依据《现代汉语语法信息词典》中对词语多义的属性特征描述,对《人民日报》语料中155个词语共4996个同形实例进行了粗粒度词义自动消歧实验,同时用贝叶斯算法进行了比较测试。基于词典属性特征的消歧方法在同形层面上准确率达到90%,但召回率偏低。其优点在于两个方面:1)不受词义标注语料库规模的影响;2)对特定词语意义的消歧准确率可达到100%。本文也讨论了适用于不同词类的消歧特征。This paper presents a simple but effective feature-based approach to Chinese word sense disambiguation using the distributional features available from the Grammatical Knowledge-base of Contemporary Chinese. The test data is the sense-tagged corpus of People's Daily. A Naive Bayes classifier is also tried as a comparable statistical method. The feature-based approach achieves precision of 90%, which is comparable to the NB classifier. The striking advantages of the feature-based approach are 1) It is not influenced by the data size, and 2) It can disambiguate some specific words with precision of 100%. The features appropriate for different parts of speech in Chinese WSD are also discussed. This paper demonstrates that sense features described in the lexicon are worth including in WSD.

关 键 词:人工智能 自然语言处理 特征 词义 词义消歧 贝叶斯分类法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象