藏文单音节单纯词抽取方法设计与实现  

Design and Implementation of a Monosyllabic Monomorphemic Words Extraction Method for Tibetan

在线阅读下载全文

作  者:才让东知 祁坤钰[1,2] 贡保杰布 DongzhiTsering;QI Kun-yu;Gongbaojiebu(Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education,Northwest Minzu University,Lanzhou 730030,China;Gansu Provincial Key Laboratory of intelligent processing of national languages,Northwest Minzu University,Lanzhou 730030,China;School of Computer Science,Qinghai Normal University,Xining 810000,China)

机构地区:[1]西北民族大学甘肃省民族语言智能处理重点实验室,甘肃兰州730030 [2]西北民族大学中国民族语言文字信息技术教育部重点实验室,甘肃兰州730030 [3]青海师范大学计算机学院,青海西宁810000

出  处:《西北民族大学学报(自然科学版)》2023年第3期16-24,共9页Journal of Northwest Minzu University(Natural Science)

基  金:国家自然科学基金项目“面向长序列的文档级神经机器翻译关键技术研究”(62266038)。

摘  要:针对藏文词汇资源匮乏和词汇分级模糊等问题,采用词典语料和词性标注语料相结合的方法,设计了藏文单音节单纯词抽取模型,规划了详细的技术方案,构建了比较完整的词典语料库,获得了藏文单音节单纯词的分类词表,依据相对通用度得到了分级词表,其中名词、动词、形容词、副词和数词等单音节单纯词总数1414条,词性之间存在大量的兼类现象,对汉藏语言资源库建设具有重要意义.In this thesis,a dictionary corpus and a lexical annotation corpus were combined to design a Tibetan monosyllabic monomorphemic words extraction model,plan a detailed technical scheme,and construct a relatively complete dictionary corpus to address the lack of Tibetan lexical resources and the ambiguity of lexical grading.A classification list of Tibetan monosyllabic monomorphemic words were obtained,and a graded word list was obtained based on the relative generality,in which the total number of monosyllabic monomorphemic words such as nouns,verbs,adjectives,adverbs and numerals were 1414,and there were a large number of parthenogenesis between words.It is of great significance to the construction of Sino-Tibetan language resource base.

关 键 词:藏文单纯词 抽取模型 语料库 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象