检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《中文信息学报》2007年第2期3-8,共6页Journal of Chinese Information Processing
基 金:国家973计划资助项目(2004CB318102)
摘 要:本文依据《现代汉语语法信息词典》中对词语多义的属性特征描述,对《人民日报》语料中155个词语共4996个同形实例进行了粗粒度词义自动消歧实验,同时用贝叶斯算法进行了比较测试。基于词典属性特征的消歧方法在同形层面上准确率达到90%,但召回率偏低。其优点在于两个方面:1)不受词义标注语料库规模的影响;2)对特定词语意义的消歧准确率可达到100%。本文也讨论了适用于不同词类的消歧特征。This paper presents a simple but effective feature-based approach to Chinese word sense disambiguation using the distributional features available from the Grammatical Knowledge-base of Contemporary Chinese. The test data is the sense-tagged corpus of People's Daily. A Naive Bayes classifier is also tried as a comparable statistical method. The feature-based approach achieves precision of 90%, which is comparable to the NB classifier. The striking advantages of the feature-based approach are 1) It is not influenced by the data size, and 2) It can disambiguate some specific words with precision of 100%. The features appropriate for different parts of speech in Chinese WSD are also discussed. This paper demonstrates that sense features described in the lexicon are worth including in WSD.
关 键 词:人工智能 自然语言处理 特征 词义 词义消歧 贝叶斯分类法
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222