检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:色差甲 贡保才让[1] 才让加[1] SE Cha-jia;Gongbao Cai-rang;CAI Rang-jia(Qinghai Normal University Tibetan Information Processing Key Laboratory of Ministry of Education,Provincial Key Laboratory of Qinghai Normal University of Tibetan Information Processing with Machine Translation. Xining,81000,China)
机构地区:[1]青海师范大学藏文信息处理教育部重点实验室青海师范大学藏文信息处理与机器翻译省级重点实验室,青海西宁810008
出 处:《青海师范大学学报(自然科学版)》2018年第1期12-16,共5页Journal of Qinghai Normal University(Natural Science Edition)
基 金:国家自然科学基金(61063033;61662061);教育部重点实验室项目(教技函[2010]52号);教育部"创新团队发展计划"滚动支持计划(IRT_15R40);青海省重点实验室项目(2013-Z-Y17;2014-Z-Y32;2015-Z-Y03);青海省科技厅项目(2015-SF-520)
摘 要:藏文新词在科技、新闻和网络等领域不断出现,对藏文自动分析带来了挑战.本文将使用序列标注方法来识别藏文新词,首先用规则方式将时间词、数词、后接成份嵌入到统计模型中,然后利用统计学习的方法对包括新闻、法律、小说、诗歌、中小学教材和地名等多种题材的共15万藏文句子进行统计建模,最后对3087句(其中包含12348个新词)开放语料进行测试,实验结果表明将规则嵌人到最大熵模型比嵌入到HMM模型中的正确率、召回率、F值分别高1.772、0.3905、1.0912个百分点,对于藏文新词识别最大熵模型优于HMM模型.Tibetan new words made appearances constantly in many fields including technology, news and internet,and brought many challenges for Tibetan automatic analysis. In this paper,the use the method of sequence annotation to identify Tibetan words. Firstly,the time words,numerals and subsequent components into statistical models. Then use statistical learning methods to include news,law,novels,poems, school textbooks and place names a total of 1.5 million Tibetan sen- tences in a variety of topics for statistical modeling. Finally,testing on an open corpus which in- cludes 3078 sentences(containing 12348 new words in total),The experimental results show that the rules are embedded in the maximum entropy model HMM model in the accuracy,recall rate and F value were 1.772,0.3905,1.0912 percentage,for the Tibetan new words to identify the maximum entropy model is superior to the HMM model.
分 类 号:TP393.0[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.69