检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:阿里木·赛买提 沙丽瓦尔·阿里木 吐尔根·依不拉音[1] 段雪明 古丽尼格尔·阿不都外力 麦合甫热提 吾守尔·斯拉木[1] ALIM Samat;SHALIWAER Alimu;TURGUN Ebrayim;DUAN Xue-ming;GULINIGEEr Abuduwaili;Maihefureti;WUSHUER Silamu(Laboratory of Multi-language Information Technology,Xinjiang University,Urumqi 830046,China;The Open University of Xinjiang,Urumqi 830049,China;Xinjiang Information Technology Company Limited,Urumqi 830015,China)
机构地区:[1]新疆大学信息科学与工程学院多语种信息技术实验中心,新疆乌鲁木齐830046 [2]新疆开放大学,新疆乌鲁木齐830049 [3]新疆科大讯飞信息科技有限责任公司,新疆乌鲁木齐830015
出 处:《东北师大学报(自然科学版)》2022年第2期76-80,共5页Journal of Northeast Normal University(Natural Science Edition)
基 金:新疆维吾尔自治区重点实验室开放基金资助项目(2016D03023,2018D04019);国家自然科学基金资助项目(61662077,61762084);国家语言文字工作委员会科研项目(ZDI135-54)。
摘 要:针对维汉人名数据集稀少且难以获取等问题,提出了从常规维汉句对数据中通过Fast align对齐方法结合NER方法抽取维汉人名数据的方法.针对维吾尔人名翻译后易出现集外词(OOV)问题或不雅字、不恰当译文表示的问题,通过对维汉人名数据中汉语部分训练1—4阶N-Gram语言模型,根据语言模型对该数据进行打分后筛选出best-2结果,并结合了维汉字符级端到端的神经网络人名翻译模型.通过实验可发现,结合本文提出的前处理方法后的维汉人名翻译模型效果是BLEU提升了0.95分,并且不雅字或不恰当表示问题也得到了明显的改善.In view of the scarcity and difficulty in obtaining the Uygur and Chinese name data sets, this paper implements a method of extracting Uyghur and Chinese names data from the regular Uyghur-Chinese sentence pairs by Fast align alignment method combined with NER method.As for the problems of out of vocabulary words(OOV) or indecent words or other improper translation expressions that are easy to occur after the translation of Uyghur names, The N-gram language model is trained through the Chinese part of Uyghur-Chinese name data.After scoring the data according to the language model, the Best-2 results are screened out, and the refined data are obtained by combining with the manual assistance, and the end-to-end character level neural network translation model of Uyghur-Chinese name is trained end-to-end human name translation model of Uyghur and Chinese character level was trained.Through the experimental results, it can be found that the effect of the Uyghur-Chinese name translation model combined with the pre-processing method proposed in this paper is better than that of the model without pre-processing, with a score of 0.95 BlEU and that the problem of indecent words or appropriate expressions has been effectively managed.
关 键 词:机器翻译 OOV 维汉人名 Fast align 字符级端到端的神经网络
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222