检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:才藏太[1] 索南才让[1] Cai Zang Tai;Suo Nan Cai Rang(State Key Laboratory of Tibetan Intelligent Information Processing and Application of Qinghai Normal University,Xining,Qinghai 810008)
机构地区:[1]青海师范大学藏语智能信息处理及应用国家重点实验室,青海西宁810008
出 处:《青海民族大学学报(藏文版)》2023年第3期99-110,共12页Journal of qinghai minzu University:Tibetan Version
摘 要:藏语短语分类体系研究是藏语语言信息处理的重要组成部分,是关键的技术难题。该项技术研究将直接运用于藏文通用大型语料库的建设,在藏文文字识别、自动分词、自动校对、信息检索、文本分类、机器翻译等方面有重要的应用价值,是未来藏文信息传播、交换、藏语智能化研究的动力和基础。藏语短语是藏语语法的一个重要特征和主要内容,藏语短语也像其它语言一样具有一定的语法规则,由实词和虚词搭配而成。藏语短语中词与词的关系、词与虚词的关系是藏语短语研究的重点,也是藏语短语结构中值得关注的研究方向之一。藏语短语分类体系的研究是自然语言处理的基础性任务之一,是近年来研究者持续关注的重要研究课题。该文在从大型藏语语料库中抽取大量的藏语短语的基础上,对其内部结构、语法功能等进行深入的分析,参考语言学文献中藏语短语的分类体系,遵循计算机便于自动分析和处理的原则对藏语短语进行了分类,并规定了信息处理中藏语短语类别单位的标记代码。The study of Tibetan phrase classification system is an important component of Tibetan language information processing and a key technical challenge.This technological research will be directly applied to the construction of a large-scale Tibetan universal corpus,with important application value in Tibetan text recognition,automatic word segmentation,automatic proofreading,information retrieval,text classification,machine translation,etc.It is the driving force and foundation for future Tibetan information dissemination,exchange,and Tibetan intelligence research.Tibetan phrases are an important feature and main content of Tibetan grammar,and like other languages,Tibetan phrases also have certain grammar rules,formed by the combination of content words and function words.The relationship between words and the relationship between words and function words in Tibetan phrases is a focus of research on Tibetan phrases,and it is also one of the research directions worth paying attention to in Tibetan phrase structure.The study of the Tibetan phrase classification system is one of the fundamental tasks in natural language processing and has been an important research topic that researchers have been continuously paying attention to in recent years.On the basis of in-depth analysis of the internal structure,grammatical function,and other aspects of Tibetan phrases extracted from a large-scale Tibetan corpus,this article refers to the classification system of Tibetan phrases in linguistic literature,classifies Tibetan phrases according to the principle of easy automatic analysis and processing by computers,and specifies the marking codes for the category units of Tibetan phrases in information processing.
分 类 号:H214[语言文字—少数民族语言]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.144.95