规则与统计结合的俄语基本名词短语识别  

Recognition of Russian base noun phrase based on rules and statistics

在线阅读下载全文

作  者:刘颖[1] 季铎[1] 黄海红[2] 蔡东风[1] 

机构地区:[1]沈阳航空航天大学知识工程研究中心,沈阳110136 [2]中国商飞上海飞机设计研究院,上海201210

出  处:《沈阳航空航天大学学报》2014年第6期66-72,共7页Journal of Shenyang Aerospace University

基  金:国家"十二五"科技支撑计划项目(项目编号:2012BAH14F00)

摘  要:针对目前国内鲜有研究且语料资源缺乏的俄语基本名词短语(Base Noun Phrase,Base NP)识别,提出一种规则与统计相结合的方法,其优势是在有限资源的基础上,既能充分利用俄语Base NP在词性构成上的规律特点,通过俄汉词典统计得到最佳词性搭配模式库进行模式匹配;又无需人工标注统计工具所需的训练语料,仅依靠词典和词性搭配模式库自动构建,节省标注代价。规则与统计的结合,既能在很大程度上召回Base NP,又能使用条件随机场(Conditional Random Fields,CRF)纠正规则标注的歧义和错误,处理规则未能覆盖的情况。实验表明,使用该方法实现的俄语基本名词短语识别效果良好,其F值达到了84.14%。In attempt to build Russian corpus, a method for the recognition of Russian base noun phrase based on statistics and rules is proposed. It can not only take advantage of characteristics of Russian Ba- seNP in the part of speech, by selecting in the best pattern corpus of speech collocation from the Rus- sian-Chinese dictionary tO do pattern matching, but also build corpus automatically instead of manually, which is for statistical tools to use, according to the dictionary and the pattern corpus of speech colloca- tion only. The combination of rules and statistics can tag base noun phrase candidates as much as possi- ble,and rectify those ambiguous tagged candidates or errors by CRF, dealing with the rule-uncovered phenomena. The results show that the method is efficient for identifying Russian base noun phrase, of which the F-score reaches 84. 14%.

关 键 词:俄语 基本名词短语 词性搭配模式 CRF 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象