检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:华果才让 班玛宝 桑杰端珠 才让加[1,2,3,4] HUA Guo-cai-rang;BAN Ma-bao;SANG Jie-duan-zhu;CAI Rang-jia(School of Computer Science,Qinghai Normal University,Xining Qinghai 810016,China;Qinghai Key Laboratory of Tibetan Information Processing and Machine Translation,Xining Qinghai 810008,China;Qinghai Tibetan Information Processing Engineering Technology Research Center,Xining Qinghai 810008,China;State Key Laboratory of Tibetan Intelligent Information Processing and Application,Xining Qinghai 810008,China)
机构地区:[1]青海师范大学计算机学院,青海西宁810016 [2]青海省藏文信息处理与机器翻译重点实验室,青海西宁810008 [3]青海省藏文信息处理工程技术研究中心,青海西宁810008 [4]藏语智能信息处理及应用国家重点实验室,青海西宁810008
出 处:《计算机仿真》2021年第12期391-396,共6页Computer Simulation
基 金:国家自然科学基金项目(61662061,61063033);国家重点研发计划(2017YFB1402200);青海省藏文信息处理与机器翻译重点实验室(2020-ZJ-Y05)。
摘 要:机器翻译是自然语言处理的主要分支之一,在促进政治、经济、文化交流等方面起着重要作用。目前汉藏机器翻译质量还有待提高,汉文到藏文的译文中容易出现语法错误,尤其普遍存在藏文虚词的翻译错误。分析汉藏机器翻译译文中的藏文虚词错误类型,并究其自动纠错方法是提高汉藏机器翻译性能最有效的方法。在分析汉藏机器翻译译文中虚词错误类型的基础上,利用大规模藏文文本对Bert进行预训练。然后面向汉藏机器翻译译文中的虚词错误类型,针对性的对Bert预训练模型进行微调,以完成一种面向汉藏机器翻译后处理的Bert藏文虚词纠错模型的训练。经实验,模型的纠错准确率、召回率和F1值分别达95.64%,93.27%,94.44%,表明上述模型的藏文虚词纠错性能较好。Machine translation is one of the main branches of natural language processing, which plays an important role in promoting political, economic and cultural exchanges. At present, the quality of Chinese-Tibetan machine translation still needs to be improved. Grammatical errors are easy to occur in the translation from Chinese to Tibetan, especially in the translation of Tibetan function words. We analyzed the error types of Tibetan function words in the translation of Chinese-Tibetan machine translation and studied its automatic error correction method is the most effective way to improve the performance of Chinese-Tibetan machine translation. Therefore, based on the analysis of the error types of function words in Chinese-Tibetan machine translation, this paper used large-scale Tibetan text to pre-train Bert. Then, for the error types of function words in the translation of Chinese-Tibetan machine translation,the Bert pre-training model was pertinentlyfine-tuned to complete the training of the Bert Tibetan function word error correction model for the post-processing of Chinese-Tibetan machine translation. The experimental results show that the error correction accuracy, recall rate and F1 value of the model are 95. 64%, 93. 27% and 94. 44% respectively,which shows that the model has good error correction performance for Tibetan function words.
关 键 词:机器翻译 掩码语言模型 预训练 微调 藏文虚词纠错
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117