多级索引的藏语分词词典设计被引量：6

Design of Tibetan word segmentation dictionary with multi-level index

作　　者：姚徐[1,2] 郭淑妮[1,2] 李永宏[1,2] 于洪志[1,2]

机构地区：[1]西北民族大学中国民族信息技术研究院,兰州730030 [2]西北民族大学中国民族语言文字信息技术重点实验室,兰州730030

出　　处：《计算机应用》2009年第B06期178-180,共3页journal of Computer Applications

基　　金：中国科学院自动化研究所模式识别国家重点实验室开放课题;国家863计划项目(AA2006010101)

摘　　要：藏语分词词典是藏语自动分词系统的重要基础,词典规模大小和算法设计的优劣直接影响着分词的效率。本项目首先收集了多部藏语字、词典的所有词条及藏语标点符号,形成了约10万词条的大型藏语分词词库;根据藏字不同长度的特点,建立了藏语特有的多级索引分词词典机制,分析设计藏语整词二分法进行藏语分词。实验结果表明该藏语分词词典具有结构简单,分词速度快和查询性能高等优点。Tibetan word segmentation dictionary is the vital basis of the system of Tibetan automatic word segmentation, with the scale of the dictionary and the arithmetic design directly related to the efficiency of the word segmentation. This project firstly collected all the Tibetan vocabulary entries and punctuations from many dictionaries, and form an enormous Tibetan word storeroom with about 100 000 vocabularies. Secondly, a unique Tibetan multi-level index word segmentation mechanism had been founded to analyze and design Tibetan who/e-word dichotomy for Tibetan word segmentation according to the characteristic of Tibetan words with different length. The experimental results indicate that the Tibetan word segmentation dictionary has the advantages of simple structure, quick word segmentation, high inquiry capability, etc.

关键词：藏语分词分词词典藏语整词二分法多级索引

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多级索引的藏语分词词典设计被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多级索引的藏语分词词典设计 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

多级索引的藏语分词词典设计被引量：6