哈萨克语词法分析器的研究与实现被引量：16

Study and implementation of Kazakh lexical scanner

出　　处：《计算机工程与应用》2008年第19期146-149,共4页Computer Engineering and Applications

基　　金：国家自然科学基金(the National Natural Science Foundation of China under Grant No.60763005)

摘　　要：研究了哈萨克语自动词法分析中的附加成分的切分和词干提取问题,并实现了哈萨克语词法分析系统KazStemmer。系统首先对待切分词使用有限状态自动机进行分析。如果成功则将输出作为切分结果,否则再使用双向全切分和词法分析相结合的改进方法来进行切分。与最大匹配法相比,该方法提高了词干提取的正确率和切分速度。同时,在词干表的搜索中首次采用了改进的逐字母二分词典查询机制来提高了词干提取的效率。This paper studies the problems of stem and affix segmentation in Kazakh automatic morphological analysis and develops a system called ＂KazStemmer＂,which can automatically carry out the stem segmentation and tagging processes for Kazakh corpora.In this paper,the authors first use FSM to analyze the stemming words.IF the FSM does not work,then the combination of the bidirectional matching algorithm,omni-word segmentation algorithm and morphological analysis is used to implement the segmentation of stems and word affixes.Compared to the maximum matching algorithm,this method can get higher precision and processing speed.In addition,the authors use the improved binary-seek-by-character dictionary query mechanism.Its performance also influences the segmentation speed significantly.

关键词：附加成分切分有限状态自动机双向匹配全切分

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

哈萨克语词法分析器的研究与实现被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

哈萨克语词法分析器的研究与实现 被引量：16

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

哈萨克语词法分析器的研究与实现被引量：16