检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡泽林 高翊 李淼[3] 曹宜超 HU Zein;GAO Yi;LI Miao;CAO Yichao(School of Physics and Electronic Information,Gannan Normal University,Ganzhou,Jiangxi 341000,China;Yunnan Minority Language Working Committee Office,Kunming 650499,China;Institute of Intelligent Machines,Chinese Academy of Sciences,Hefei 230031,China)
机构地区:[1]赣南师范大学物理与电子信息学院,江西赣州341000 [2]云南省少数民族语文指导工作委员会办公室,云南昆明650499 [3]中国科学院合肥智能机械研究所,安徽合肥230031
出 处:《昆明理工大学学报(自然科学版)》2023年第3期85-92,共8页Journal of Kunming University of Science and Technology(Natural Science)
基 金:国家重点研发计划项目(2017YFD0701600);赣南师范大学博士科研启动基金项目(13SJJ202130)。
摘 要:随着机器学习技术的发展,文字翻译模型的翻译效率与准确率逐步提高,要达到理想的翻译效果离不开大量高质量的平行语料.疫情以来,我国坚持扩大内需、形成强大的国内市场,各民族间的联系比以往更为紧密,各种语言间的翻译尤为重要.蒙古语作为一类使用量较大的少数民族语言,不同词形涵义千差万别且缺少足够的平行语料支撑训练,导致现有的语言翻译模型效果不佳.本文针对以上问题,进行如下研究:(1)提出字符级语句分割,缓解平行语料不足带来的未登录词问题,降低了计算成本.(2)使用去噪自编码技术,迫使模型学习如何更加鲁棒地表达输入特征,增强模型的泛化能力.(3)使用无监督对偶式迭代翻译模型,将汉蒙翻译与蒙汉翻译以对偶方式同时进行无监督式迭代训练,实现语言建模与双向翻译,通过比较同数据集下该模型与传统Transformer模型训练的BLEU值得出,该模型具有更好的性能、更高的翻译准确率.With the advancement of machine learning technology,the efficiency and accuracy of text translation models have significantly improved.However,achieving desired translation results heavily relies on high-quality parallel corpora.Since the out break of pandemic,in light of China's focus on expanding domestic demand and fostering a strong domestic market,translation among different languages has become increasingly important,especially with closer ties between ethnic groups.Mongolian,as a minority language with limited usage,poses challenges due to varying word forms and the lack of sufficient parallel corpus for training,resulting in unsatisfactory performance of existing translation models.To address these issues,this paper conducts the following research:(1)Introducing character-level sentence segmentation to alleviate the problem of unlisted words caused by the scarcity of parallel corpora and reduce computational costs.(2)Employing denoising self-encoding technology to enhance the model's ability to robustly represent input features and improve generalization.(3)Utilizing an unsupervised dual iterative translation model to simultaneously train Chinese-Mongolian and Mongolian-Chinese translation,enabling language modeling and bidirectional translation.By comparing the Bleu score of this model with that of the traditional transformer model using the same dataset,superior performance and higher translation accuracy are observed.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38