哈萨克语词性自动标注研究初探  被引量:8

Preliminary study on Kazak Part-of-Speech automatic tagging

在线阅读下载全文

作  者:刘艳[1] 古丽拉.阿东别克 伊力亚尔[1] 

机构地区:[1]新疆大学信息科学与工程学院,乌鲁木齐830046

出  处:《计算机工程与应用》2008年第20期242-244,共3页Computer Engineering and Applications

基  金:国家自然科学基金(the National Natural Science Foundation of China under Grant No.60763005)

摘  要:词性标注在很多信息处理环节中都扮演着关键角色。哈萨克语作为新疆地区通用的少数民族语言之一,自然语言处理中的一些基础性的课题同样成为迫切需要解决的问题。分析了哈萨克语的构形语素特征,基于词典的一级标注基础上,采用统计方法,训练得到二元语法的HMM模型参数,运用Viterbi算法完成了基于统计方法的词性标注,最后运用哈语规则库对词性标注进行了修正。对单纯使用统计方法和以统计为主辅以规则修正的方法进行了比对测试,结果表明后者排岐正确率有所提高。Part-of-Speech tagging is playing a key role in many such information processing.Kazak,as one of the minority languages and characters being universally applied or used in Xinjiang,some basic problems in natural language treatment become the problems to be solved urgently.The thesis analyzes the configuration of Kazak morpheme characteristics.Based on the completement of one-level tagging of the dictionary,it adopts statistical methods,gaining model training parameter under the bigram HMM,and adopting the Viterbi algorithm to complete the Part-of-Speech tagging based on the statistical method.Finally adopting the Kazak language regular storehouse in revising parts of speech.The thesis finally compares and tests the methods of pure use of statistics and that of giving first place to statistical methods and assists the methods being amended with regulation. And final result indicates that the latter method enhances the correctness rate in arrangement.

关 键 词:哈萨克语词性标注 构形语素 二元语法 HMM 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象