兼类词概率分布计量考察及语法搭配模式在中文信息处理中的应用  

A Study of the Probability Distribution and Grammatical Collocation Patterns of Multi-Category Words in Chinese Information Processing

在线阅读下载全文

作  者:王浩学 徐艳华[1] 

机构地区:[1]鲁东大学文学院,山东 烟台

出  处:《现代语言学》2021年第2期524-529,共6页Modern Linguistics

摘  要:在词性标注的过程中,汉语中兼类词的存在是影响词性标注准确率的主要原因。本研究以三部词典标注一致的78个形名兼类词为测试对象,基于规则和统计相结合的词性标注方法,将统计的兼类词分布概率与语法搭配规则结合起来,利用兼类词语法搭配模式构建规则库,对国家语委现代汉语通用平衡语料库标注的兼类词结果进行修正,准确率可以提高14.57%。In the process of part-of-speech tagging, the existence of multi-category words in Chinese is the main reason that affects the accuracy of part-of-speech tagging. In this study, 78 adjective-noun multi-category words of the same part-of-speech tagging in the three dictionaries are the test objects. The part-of-speech tagging method based on the combination of rules and statistics combines the statistical distribution probability of multi-category words with grammatical collocation rules, and builds a rule database using the grammatical collocation mode of multi-category words. The rule database corrects the results of the multi-category words tagged by the modern Chinese corpus of State Language Commission, and the accuracy rate can be increased by 14.57%.

关 键 词:兼类词 语法搭配 语料库应用 词性标注 

分 类 号:H14[语言文字—汉语]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象